The chaining algorithm
Each WebLicht web service is described in the CLARIN-D repositories by a piece of metadata. This contains information about the service creators, a brief description, the service registration date, the URL, etc. In addition to this human-oriented metadata there is also the orchestration metadata. The purpose of the orchestration metadata is to describe the profile of the data that a web service accepts and the profile of its output. This I/O data specification is designed to be as straightforward and generic as possible. It is domain neutral and independent of the actual data format used by the services, but at the same time simple to understand, use and generate.
The orchestration metadata is a list consisting primarily of data features, each feature containing zero or more values. The input description specifies what properties input data must have in order to be correctly processed by the service. The output description specifies properties of the output data that is generated by the service.
For example, in the simple case of a converter tool from a Microsoft Word document to a more convenient format for linguistic processing, the xml-based TCF file format, the data specification would assert that the input must be a Microsoft Office document and the output will be a TCF file with a specific version, a specific language and a text layer (represented in the xml file as a special node). Using the WebLicht orchestration data language, the specification looks like this:
<input> <feature name="type"> <value name="application/msword"/> </feature> </input> <output new="true"> <feature name="type"> <value name="text/tcf+xml"/> </feature> <feature name="version"> <value name="0.3"/> </feature> <feature name="lang"> <value name="en"/> </feature> <feature name="text"/> </output>
The feature and value name semantics are not formally specified. However, consistency of the feature names is necessary for interoperability of web services within the same domain. The input specification in the example asserts that the "type" feature of the input data must be "application/msword". The "type" feature in this case designates the data mime type.
The output specification declares that the web service output format is TCF (the value of its "type" feature is "text/tcf+xml"); that version 0.3 of this file format is used; that this TCF file will contain a text layer (the notion of layers is specific to TCF); and that the declared language of this text layer is English.
The output specification can operate in two different modes: addition or replacement. The default mode is addition, meaning that the declared output features for the web service are added to the ones the input data already have. This mode is used when web services simply augment the original input data.
In the example above the replacement mode is used as declared by the attribute new="true". This indicates that the web service completely transforms the input data, so that the profile of the output data is determined solely by the declared output features of the service.
This system of data-describing assertions makes chaining possible (even before any services have actually been invoked). Chaining is the process of automatically finding appropriate web services when the characteristics of the data to be used as input are known. Each time a service is selected, the characteristics of the resulting data are recomputed and used to generate a new set of suitable services. The set of the output features in the present example is a description of the resulting piece of data: a version 0.3 TCF file with an English text layer. Automatically finding new services to be applied in the chain is now just a matter of filtering through the list of the available services.