Semantic Relatedness: an interactive walkthrough

Introduction

The Semantic Relatedness page allows you to calculate the semantic relatedness of synsets in GermaNet.

You specify which synsets you want to compare, and which measures of semantic relatedness you want to use to compare them. Rover then performs the calculations and displays the results.

You can choose between two ways of working with the relatedness calculations:

On the Visualize Relatedness tab, you can choose two specific synsets, calculate their relatedness according to several measures, and visualize the structure in GermaNet on which these calculations are based.

On the Batch Processing tab, you can calculate the relatedness of up to 200 word pairs by uploading a file and downloading the calculation results.

We'll begin with Visualize Relatedness, and return to Batch Processing below.

Searching for and selecting synsets

The first step in calculating semantic relatedness is selecting the two synsets to compare.

To select the two synsets you want to compare, search for them using the search forms at the top of the page. Each search form looks like this:

The term you type into the search box will be matched against all the synsets in GermaNet. Any synset containing at least one matching word will be returned.

Click "Show search options" to see how to refine your search. You can search with regular expressions or by edit distance, and you can narrow your search by grammatical category or semantic class.

Once you have performed a search, you will see some buttons underneath the search form representing your search history. They look like this:

Clicking one of these buttons will run the corresponding search again using all the same search options. Hover over a button to see how many results that search returned.

The synsets found for your search are displayed as a list below the search form:

1 result
  • Artefakt
    Gitarre

    n. ein populäres Zupfinstrument mit sechs oder zwölf Saiten

    • Component meronyms2
      Schlagbrett; Kopfplatte
    • Hypernyms1
      Zupfinstrument
    • Hyponyms13
      Jazzgitarre; Brahmsgitarre; Stahlsaitengitarre; Flamencogitarre ...
    • Related to1
      Kapodaster
    To see more details about this synset, use a larger screen

The list displays a summary for each synset. At the top of the summary, you see the words in the synset and definitions for these words, as well as the semantic class the synset belongs to (here, Artefakt). The boxes below summarize the conceptual relations of the synset by their types. This information helps you distinguish different synsets containing the same word.

You can select the first synset that you want to compare from this list by clicking on its summary.

To select the second synset, repeat this procedure with the second search form on the page.

Relatedness measures

Once you have selected two different synsets, they will automatically be compared using a variety of measures.

You can select exactly which measures you want to use by toggling these checkboxes below the search forms:

Measures
Some semantic relatedness measures (Resnik, Lin, Jiang and Conrath) integrate information about word frequency from the COW project. If you use any measure that relies on the frequency lists, please cite COW14 and COW16 and report your publication to the COW project.

The names of these measures refer to the research articles where they are described:

Wu and Palmer
Wu, Zhibiao, and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, 133–138. ACL ’94. Las Cruces, New Mexico: Association for Computational Linguistics.
Leacock and Chodorow
Leacock, Claudia, and Martin Chodorow. 1998. ‘Combining Local Context and WordNet Similarity for Word Sense Identification’. In WordNet: An Electronic Lexical Database. MIT Press.
Resnik
Resnik, P. 1999. Semantic Similarity in a Taxonomy: An Information-Based Measure and Its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11 (July): 95–130.
Lin
Lin, D. 1998. ‘An Information-Theoretic Definition of Similarity’. In Proceedings of the Fifteenth International Conference on Machine Learning, 296–304. San Francisco, CA, USA: Morgan Kaufmann Publishers.
Jiang and Conrath
Jiang, Jay J., and David W. Conrath. 1997. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In Proceedings of the 10th Research on Computational Linguistics International Conference, 19–33. Taipei, Taiwan: The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).

In addition, the simple path measure gives the length of the path between the two synsets via the hypernymy relation, relative to the length of the longest such path in GermaNet between two synsets of the same grammatical category as those selected.

Results format

Alongside the checkboxes for selecting the measures, you can also set two options that control how results are displayed: the normalization maximum and the precision. You can set them via these text fields:

Results format
Results are normalized between 0 and this value. Enter 0 for no normalization.
The number of decimal places displayed in measure results

The results of the measure calculations are normalized to a common interval so that the different measures can be more easily compared. The normalization maximum determines the upper bound of this interval. For example, setting this field to 100 normalizes results to the interval [0, 100]. You can set the normalization maximum to 0 to turn off normalization and see the raw values for each measure.

You can also determine how many decimal places you would like displayed in the results by setting the precision.

Results

The results of the measure calculations for the two selected synsets are displayed in a table below the search forms. The table looks like this:

MeasureScore
Simple Path0.82857
Wu and Palmer0.70000
Leacock and Chodorow0.45698
Resnik0.33696
Lin0.68581
Jiang and Conrath0.84563

The table updates automatically whenever you select a different synset or different measures. This allows you to quickly explore how the measures behave. Try changing the checkboxes above, or selecting a different synset in the box below, to see how the synset for Gitarre compares to the Geige, Violine, Fiedel synset or the Trompete synset according to the different measures.

2 results
  • Artefakt
    Geige, Violine, Fiedel

    n. Musik: Aus der Viola da braccio hervorgegangenes Streichinstrument mit flachem Korpus und 4 Saiten, welche in G-D-A-E gestimmt sind; Violine, aus der Viola da braccio hervorgegangenes Streichinstrument mit flachem Korpus und 4 Saiten, welche in G-D-A-E gestimmt sind

    • Component meronyms1
      Saite
    • Hypernyms1
      Streichinstrument
    • Hyponyms3
      Stradivari; Barockgeige, Barockvioline; Solovioline
    • Related to2
      Konzertmeisterin, Konzertmeister; Geigenkasten
    To see more details about this synset, use a larger screen
  • Artefakt
    Trompete

    n. Musik: Hohes Blechblasinstrument mit einem Kesselmundstück

    • Hypernyms1
      Blechblasinstrument
    • Hyponyms4
      Jazztrompete; Naturtrompete; Barocktrompete; Fanfare

Visualizing paths between the synsets

The Visualize Relatedness tab also displays a graph to help you visualize how the two synsets are related. The graph looks like this:

In the graph, each node is a synset, and each arrow points from a synset to a hypernym synset. Like the measures results table, the graph automatically updates when you select a different pair of synsets.

The relatedness measures are based on the idea of a least common subsumer for two synsets. This is the closest synset reachable from both synsets by following their immediate hypernyms. In the visualization, the least common subsumers of the two selected synsets are displayed in yellow, at the top. The two selected synsets appear in green, at the bottom.

The visualization shows all paths between the two synsets via their least common subsumers. For these synsets, there is only one least common subsumer, and only one path, but there can be several.

You can zoom in or out by scrolling, and reposition the graph by dragging it; or you can use the navigation buttons on the graph to do the same things. Use the button to reset the zoom and re-center the graph.

Batch processing

Let's now have a look at the Batch Processing interface:

Using this interface, you can calculate the semantic relatedness of many synsets at once, by uploading a file of word pairs. The batch processor searches for synsets containing the words in each pair, and then compares each pair of synsets containing the two words. (Thus, a single word pair can generate multiple synset comparisons.) Results are provided as a file you can download for further processing.

Configuring batch processing

As on the Visualize Relatedness tab, there are several options available to configure the relatedness calculations. You can set constraints for the searches, select the individual measures, and set the normalization interval and precision for the calculations. You can specify these options in the config: section of the file you upload to the batch processor.

The config options are described in detail in the sample batch processing input file. Click the Sample Batch File link to open the sample file.

To prepare an input file, you can save a copy of the sample file and edit it to include the data and configuration options you want. Make sure you edit the file with a text editor, not a word processor. Save it as plain text, not as rich text or in a binary format.

If you do not provide a configuration section in the input file, and only provide word pairs, your file will be processed with the default configuration.

Uploading your file

Once you have prepared an input file, you can upload it to the batch processor by clicking this button, or dragging the file onto the button:

As soon as you have selected the file, it will be automatically uploaded and processed by Rover.

Results of batch processing

Once the batch processor is finished with the calculations for your input, an alert will be displayed and you will be prompted to download the results file:

Results are returned as tab-separated values and can easily be further processed with custom scripts or a spreadsheet program. The columns in the results file are:

  1. First word
  2. ID of a synset where first word was found
  3. Grammatical category of this synset
  4. Semantic class of this synset
  5. Hypernyms of this synset
  6. Second word
  7. ID of a synset where second word was found
  8. Grammatical category of this synset
  9. Semantic class of this synset
  10. Hypernyms of this synset
  11. one column for each of the (selected) semantic relatedness measures