DCA - Dynamic Corpus Analyzer

From WebLichtWiki

Revision as of 08:44, 8 October 2012 by Thomas Zastrow (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search



The Dynamic Corpus Analyzer (DCA) is a web application for analysis and visualization of files in Text Corpus Format (TCF). This is the processing format of WebLicht, which can be used to lingusitically annotate texts. After a text corpus is uploaded to DCA, a lot of different analysis and visualization can be applied to the corpus. This includes the token level (token, POS, lemma), pares trees (constituent and dependency) as well as statistical analysis and text laws for the corpus as a whole.

DCA uses two different levels of authentication and authorization (for getting an account, please contact Thomas Zastrow):

  • With a guest account, it is possible to make use of all the integrated corpora
  • For uploading your own corpora, you will need a manager account

Th following documentation explains the individual functions of DCA. In general, you can choose one from the menu in the top. On the left hand, you see some options which are sometimes mandatory to be set and a button for executing the function on the choosen corpus.

After login in to DCA, the following welcome screen welcomes you:

The Welcome Screen

All functionality in DCA is available via the menus in the top. Important: For all functions, it is necessary that at first you choose a corpus from the drop down list on the left before you execute the function. Once you have choosen a corpus, it stays actice until you close the browser:

Sc dca 25.png


Corpus Übersicht

Sc dca 2.png

The "Corpus Übersicht" shows you which corpora are in the system and which linguistic annotations they contain.


Sc dca 3.png

If you have the rights, you can here upload new corpora in TCF 0.3 format to DCA.


Sc dca 4.png

The editor is a simple solution for editing the token assigned information of a corpus. You can walk throug the tokens of a corpus, edit token, lemma and POS information and save back the edited information to the system.



Text anzeigen

Sc dca 5.png

Here the full text of the choosen corpus is dispalyed. Click "POS laden" on the left and a list with all used POS tags in the corpus appears. Select one or more of them and appearances of them will highlighted in different colors. "Start" and "Ende" allows you to specify the range of tokens which is displayed.

Sätze anzeigen

Sc dca 6.png

View the text of the corpus sentence by sentence.


Sc dca 7.png

Here you can show concordances of the corpus. Select a word in "Wort", set the left and the right context ("Content nach links / rechts") and how many hits ("Anzahl Treffer") you would like to get. After excuting the search via "Abschicken", the table in the right displays the concordances with the highlighted word in the middle. In addition, a table with all the words and their frequencies in the actual concordance is shown below.

Frequenzen visualisieren

Sc dca 8.png


Sc dca 9.png


Sc dca 10.png


Allgemeine Kennzahlen

Sc dca 11.png


Sc dca 12.png

POS Statistik

Sc dca 13.png

Lemma Statistik

Sc dca 14.png



Sc dca 15.png


Sc dca 16.png

POS eines Wortes

Sc dca 17.png

Phrase suchen

Sc dca 18.png


Grammatik erstellen

Sc dca 19.png

Konstrukt suchen

Sc dca 20.png



Sc dca 21.png

Type-Token Relationen

Sc dca 22.png


XPath anwenden

Sc dca 23.png


Semantik Übersicht

Sc dca 24.png


The Dynamic Corpus Analyzer was developed by Thomas Zastrow.