Using TIEWER for TCF 0.3 tutorial
how to use TIEWER - Java pilot desktop application for viewing and partially editing of linguistic data in TCF0.3
WebLicht services use an XML-based format called TCF to produce/consume/exchange texts with linguistic annotations. After a WebLicht tool-chain finishes processing the data, the need comes to view the results of tool processing, and possibly even edit the annotations in the document. Although one can view pure XML data in TCF and make some sense of it, the format is meant to be machine-readable, not necessarily human-readable. Therefore, we are offering tools with user-friendly graphical interface for viewing linguistic annotations contained in TCF documents:
- One of the tools is visual web-applications. It is integrated into the WebLicht web-application, where the user is offered an opportunity to view the data online after the WebLicht tool-chain processing.
- Another tool, which you may prefer to use if you have large TCF files, want to edit the document, or want to work off-line with the document, is a Java desktop application called TIEWER. This tutorial explains how to use the TIEWER and what functionality it provides.
TIEWER offers an opportunity to view and edit in an off-line mode the following linguistic annotations in a TCF0.3 document:
- text (optional), view only
- sentences (obligatory), view only
- tokens (obligatory), view only
- lemmas (optionally), view and edit
- part-of-speech tags (optionally), view and edit
- named entity annotations (optionally), view and edit
- constituent parsing (optionally), view only
Since TIEWER is a Java program, you will need to have Java installed to run TIEWER. You can check for Java Virtual Machine installation on your computer by typing in the Terminal (or DOS prompt in Windows):
If it is installed, it will result in something like:
java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
If you get a message about command not recognized, or if you have older version than Java version 5, you need to install JRE (1.5 or 1.6)
After you make sure Java 1.5 or Java 1.6 is installed on your machine, download and unzip TIEWER application. To run TIEWER from the command line, go to the tiewer-dist folder and type the following:
java -jar "tiewer.jar"
Alternatively, you may just click on the tiewer.jar file in your file manager to lunch the program.
With the default Java VM memory settings, TIEWER will not be able to view large files requiring a lot of memory. To be able to view big files, allocate more memory to the JVM. E.g. to view TCF files having size from 100MB up to 500MB , given your machine has more than 2G RAM, you may use the following command from the comman line instead:
java -jar "tiewer.jar" -Xmx1536m -XX:PermSize=512m -XX:MaxPermSize=512m
In the tiewer-dist/data folder you will find a sample file geiger-tcf0_3.xml. It was produced by a WebLicht tool-chain and we will use it by TIEWER to explore the application functionality.
After you used any of the above methods to lunch the program, the TIEWER application window should open. Initially it is empty and looks like this:
Go to File menu on the top left, then to Open sub-menu, and load a TCF file from your disk. The file should be in TCF0.3 format (you can check inside the file that the top element D-Spin has version attribute value 0.3). You may use a sample file geiger-tcf0_3.xml found in tiewer-dist/data folder:
After the TCF file gets loaded, the interface organizes the linguistic data into several views, represented by tabs. Navigate to different views by clicking on the tab.
The first view allows you to view text.
The next view allows you to view sentences. Sentences are organized in a one-column table, where each sentence corresponds to the table row.
Inline annotations View
The next view is also organized in a table, displaying tokens, lemmas, part-of-speech tags and named entity annotations. A separate table corresponds to each sentence in the data, and you can navigate to a particular sentence using the previous and next buttons, or "go to sentence" input field found on the top right of the view.
On the bottom you can find basic search functionality. You can search for a token, for a lemma, for a part-of-speech tag, or for a named entity class. When the search is applied the navigation buttons will navigate you not to the previous/next sentence, but to the previous/next sentence where the search term is found. Using "go to sentence" field, or applying search with empty input field, will cancel the search.
The last view displays constituent parsing. Each parsed sentence is represented by a tree on separate page. You can navigate to a particular parsed sentence using the previous and next buttons, or "go to parse" input field found on the top right of the view.
TIEWER provides basic editing functionality. At the current version of tiewer editing is enables for lemmas, part-of-speech tags and named entity types annotations.
We will demonstrate the editing functionality by correcting some of named entity annotations in geiger-tcf0_3.xml file provided as a sample in TIEWER distribution.
Go to inline annotations view by clicking on "inline annotations" tab. In "search sentences with" input field (located at the bottom of the view window) select "named entity type" option, enter "PER" type to search for person annotations, and click "apply search button". You will be directed to the first sentence where PER named entity annotation is found, this is sentence 1:
The PER annotation in this sentence seems to be correct, so we can click on the next button on the top left of the view window to navigate to the next PER annotation. It will be found in sentence 3. Here the PER annotations also seems to be correct. Click next button till you reach the sentence 18. Here the boundaries of person annotation need to be corrected:
Delete PER annotation from the token "Gründnis-Heraugeber" and click on the other raw to let the view remember the change. Note that previous token named entity annotation will change to PER (end) from PER (continued) automatically.
In the same sentence we can also edit ORG annotation. Namely, annotate "Zeitschrift für Physik" as ORG. Insert the following annotations:
Now we want to save the changes. Go to File menu, then to Save sub-menu, and save the changed data in a file with a different name. The tiewer will reload the data (which may take some time for big files). The new file will contain the annotation changes.
In the same way you may edit lemmas and part-of-speech annotations. Don't forget to save the data. If you want to exit the program without saving any changes, close the window, or use Exit in File menu.
If you have questions, problems, have found a bug, or have some suggestions don't hesitate to contact us.