Work field 5: Adaption and integration of resources and tools

Participating partners: University of Tübingen, MPI Nijmegen, Institute for the German Language Mannheim, Berlin-Brandenburgische Akademie der Wissenschaften Berlin, University of Leipzig, University of Frankfurt, DFKI Saarbrücken, University of Stuttgart

In the CLARIN project, the collection and classification of language resources on a European level and the interoperability in general can be considered as the focus of attention. However, in this project the standardization and integration of German resources of different kinds is the main concern. The conversion of individual formats and standard formats should be tested and run for at least the resources of participating partners. These will be made accessible by web sites. In the course of this work, guidelines for general proceedings will be deduced.

Part of the documentary tasks will be the development and documentation of linguistic category vocabulary, with which language units will be described in various resources and on different linguistic description levels. For this purpose, internationally developed category repositories need to be checked and adapted or extended for German if necessary.

Determined resources need to be provided with a standardized set of meta data, in agreement with CLARIN, and prepared for their integration in resource repositories and their transformation into web services. Referential guidelines for accessing different resource types need to be determined.

In a close cooperation with work field 3, necessary interfaces between data and tools will be specified. Interoperability guidelines from CLARIN will be adopted for German resources.

In coordination with work field 2, required web services needed for integration and connection of resources will be defined and implemented.

The tasks in this work field will be split resource oriented. The partner in charge manages the coordination and controls the individual stages of development. All resources will be a) transferred to a standard conformable format; b) supplied with standardized interfaces for access and exchange of data and c) made available via web services.

MPI Nijmegen (partner 1) provides the DOBES archive as a resource.

The University of Tübingen (partner 2) provides the German Wordnet, annotated corpora and treebanks as well as smaller resources.

The Institute for the German Language (partner 3) provides its corpora.

The Berlin-Brandenburgische Akademie der Wissenschaften (partner 4) provides its corpora and lexical resources.

The University of Leipzig (partner 5) provides its "Deutschen Wortschatz" and the lexical resources that are generated out of it.

The University of Frankfurt (partner 6) provides its historic collection of source material and typological data.

The DFKI (partner 7) and the University of Stuttgart (partner 8) provides their language technological tools. Beyond that, partner 7 contributes to the transformation of existing standards into guide lines for processing resources.