The Semantic Relatedness page allows you to calculate the semantic relatedness of synsets in GermaNet.
You specify which synsets you want to compare, and which measures of semantic relatedness you want to use to compare them. Rover then performs the calculations and displays the results.
You can choose between two ways of working with the relatedness calculations:
On the Visualize Relatedness tab, you can choose two specific synsets, calculate their relatedness according to several measures, and visualize the structure in GermaNet on which these calculations are based.
On the Batch Processing tab, you can calculate the relatedness of up to 200 word pairs by uploading a file and downloading the calculation results.
We'll begin with Visualize Relatedness, and return to Batch Processing below.
The first step in calculating semantic relatedness is selecting the two synsets to compare.
To select the two synsets you want to compare, search for them using the search forms at the top of the page. Each search form looks like this:
The term you type into the search box will be matched against all the synsets in GermaNet. Any synset containing at least one matching word will be returned.
Click "Show search options" to see how to refine your search. You can search with regular expressions or by edit distance, and you can narrow your search by grammatical category or semantic class.
Once you have performed a search, you will see some buttons underneath the search form representing your search history. They look like this:
Clicking one of these buttons will run the corresponding search again using all the same search options. Hover over a button to see how many results that search returned.
The synsets found for your search are displayed as a list below the search form:
n. ein populäres Zupfinstrument mit sechs oder zwölf Saiten
The list displays a summary for each synset. At the top of the summary, you see the words in the synset and definitions for these words, as well as the semantic class the synset belongs to (here, Artefakt). The boxes below summarize the conceptual relations of the synset by their types. This information helps you distinguish different synsets containing the same word.
You can select the first synset that you want to compare from this list by clicking on its summary.
To select the second synset, repeat this procedure with the second search form on the page.
Once you have selected two different synsets, they will automatically be compared using a variety of measures.
You can select exactly which measures you want to use by toggling these checkboxes below the search forms:
The names of these measures refer to the research articles where they are described:
In addition, the simple path measure gives the length of the path between the two synsets via the hypernymy relation, relative to the length of the longest such path in GermaNet between two synsets of the same grammatical category as those selected.
Alongside the checkboxes for selecting the measures, you can also set two options that control how results are displayed: the normalization maximum and the precision. You can set them via these text fields:
The results of the measure calculations are normalized to a common interval so that the different measures can be more easily compared. The normalization maximum determines the upper bound of this interval. For example, setting this field to 100 normalizes results to the interval [0, 100]. You can set the normalization maximum to 0 to turn off normalization and see the raw values for each measure.
You can also determine how many decimal places you would like displayed in the results by setting the precision.
The results of the measure calculations for the two selected synsets are displayed in a table below the search forms. The table looks like this:
Measure | Score |
---|---|
Simple Path | 0.82857 |
Wu and Palmer | 0.70000 |
Leacock and Chodorow | 0.45698 |
Resnik | 0.33696 |
Lin | 0.68581 |
Jiang and Conrath | 0.84563 |
The table updates automatically whenever you select a different synset or different measures. This allows you to quickly explore how the measures behave. Try changing the checkboxes above, or selecting a different synset in the box below, to see how the synset for Gitarre compares to the Geige, Violine, Fiedel synset or the Trompete synset according to the different measures.
n. Musik: Aus der Viola da braccio hervorgegangenes Streichinstrument mit flachem Korpus und 4 Saiten, welche in G-D-A-E gestimmt sind; Violine, aus der Viola da braccio hervorgegangenes Streichinstrument mit flachem Korpus und 4 Saiten, welche in G-D-A-E gestimmt sind
n. Musik: Hohes Blechblasinstrument mit einem Kesselmundstück
The Visualize Relatedness tab also displays a graph to help you visualize how the two synsets are related. The graph looks like this:
In the graph, each node is a synset, and each arrow points from a synset to a hypernym synset. Like the measures results table, the graph automatically updates when you select a different pair of synsets.
The relatedness measures are based on the idea of a least common subsumer for two synsets. This is the closest synset reachable from both synsets by following their immediate hypernyms. In the visualization, the least common subsumers of the two selected synsets are displayed in yellow, at the top. The two selected synsets appear in green, at the bottom.
The visualization shows all paths between the two synsets via their least common subsumers. For these synsets, there is only one least common subsumer, and only one path, but there can be several.
You can zoom in or out by scrolling, and reposition the graph by dragging it; or you can use the navigation buttons on the graph to do the same things. Use the button to reset the zoom and re-center the graph.
Let's now have a look at the Batch Processing interface:
Using this interface, you can calculate the semantic relatedness of many synsets at once, by uploading a file of word pairs. The batch processor searches for synsets containing the words in each pair, and then compares each pair of synsets containing the two words. (Thus, a single word pair can generate multiple synset comparisons.) Results are provided as a file you can download for further processing.
As on the Visualize Relatedness tab, there are several options available to configure the relatedness calculations. You can set constraints for the searches, select the individual measures, and set the normalization interval and precision for the calculations. You can specify these options in the config: section of the file you upload to the batch processor.
The config options are described in detail in the sample batch processing input file. Click the Sample Batch File link to open the sample file.
To prepare an input file, you can save a copy of the sample file and edit it to include the data and configuration options you want. Make sure you edit the file with a text editor, not a word processor. Save it as plain text, not as rich text or in a binary format.
If you do not provide a configuration section in the input file, and only provide word pairs, your file will be processed with the default configuration.
Once you have prepared an input file, you can upload it to the batch processor by clicking this button, or dragging the file onto the button:
As soon as you have selected the file, it will be automatically uploaded and processed by Rover.
Once the batch processor is finished with the calculations for your input, an alert will be displayed and you will be prompted to download the results file:
Results are returned as tab-separated values and can easily be further processed with custom scripts or a spreadsheet program. The columns in the results file are: