WebLicht services with JAX-RS tutorial

- how to create a WebLicht service using Java EE API for RESTful Web Services (JAX-RS)


Introduction
Prerequisites
About the sample service
   Deploying the sample service
   Testing the sample service
Implementing the sample service
   Create a RESTful Web Service project in NetBeans IDE
   Create a WebLicht service that recognizes named entities
   Test the service
   Prepare the service for deployment
Adding your service to WebLicht
Recommended reading


Introduction

WebLicht services offer online natural language processing, such as tokenization, part-of-speech tagging, parsing, etc. They make up the infrastructure of interoperable language tools and can be run sequentially to provide a complete linguistic processing chain.

Webservices for the WebLicht toolchain are implemented as RESTful webservices. The input to the service is sent over the web via the POST method of the HTTP protocol. The output of the webservice is the response to that POST event:

webservice architecture

In order for the services to be interoperable with each other, a common machine-readable XML-based exchange format called TCF (Text Corpus Format) is used. TCF stores linguistic annotations inside linguistic layers, all of them in one file. That means that during processing within the Weblicht services chain, the file grows. A typical TCF document transformation during the chain processing might look like this:

TCF transformation during chaining

The WebLicht tool-chaining architecture imposes a few restrictions on services TCF output. A Weblicht service is not permitted to change the linguistic annotations in the input TCF, it can only add new linguistic annotation layers to the document. A WebLicht service is also not allowed to add an annotation layer that already exists in the input document.

This tutorial explains how to develop a WebLicht service in Java with the JAX-RS reference implementation Jersey.


Prerequisites

For running the WebLicht sample service:

For following the tutorial:


About the sample service

The sample service developed in this tutorial represents a named entity recognizer. In order for this tutorial to focus on the development of the service and handling the input and output of the service, the named entity recognizer itself is kept extremely simple. It assigns named entity annotations by checking against a list of only several names, and therefore cannot be used as-is. However, this sample service can be used as a template for building real services.


Deploying the sample service

If you want to try out the sample service provided in the tutorial:


Testing the sample service


Implementing the sample service

Now we will develop step by step the WebLicht sample service using NetBeans IDE.


Create a RESTful Web Service project in NetBeans IDE

Create the project:

A new resource, RecognizeNamedEntities.java, is added to the project and appears in the Source pane. This file provides a template for creating a RESTful web service that we will modify to create a WebLicht service.


Create a WebLicht service that recognizes named entities

1. Add TCF0.3 binding libraries to the project:

2. Create class that represents our simplified named entity recognizer:

The above class has a method to recognize named entities (person named entities) in the TCF0.3 input, given that the input has tokens layer annotations. The TCF0.3 input comes from the input stream provided in the method parameter. If the input TCF0.3 does not contain a tokens layer, or does not comply with the TCF0.3 specification, an exception will be thrown. The recognized named entities are added as namedEntities layer into the output TCF0.3 that is being written into the output stream. The output stream is provided in the second method parameter.
This code uses the binding library TCFXB0.3 for reading and adding the layers from/into TCF0.3 documents. You can read more about this library here. You are also free to use your own mechanism to read/write data from/into TCF, e.g. using a StAX parser.

3. Open RecognizeNamedEntities.java that was generated by NetBeans and modify it in the following way:

Now the modified RecognizeNamedEntities class, a RESTful web service (or resource in RESTful terminology) represents a WebLicht service that produces named entity annotations. Let's look at it in more detail.

The RecognizeNamedEntities class is generated automatically by NetBeans based on the class name that we specified in the form when creating a new RESTful Web Services from Patterns. The path value we entered in that form is used in the @Path annotation. The @Path annotation is a relative URI path where the resource is hosted. In this case, the Java class will be hosted at the relative URI path recognize/ne/tcf03. It means it will be available at the full URI: http://localhost:8080/WebLichtServices/resources/recognize/ne/tcf03 (replace localhost:8080 by your server host name).

The method addNamedEntitiesLayer we annotated with the @POST annotation. It means that this method will be called whenever a client makes an HTTP POST request to http://localhost:8080/WebLichtServices/resources/recognize/ne/tcf03

The @Consumes annotation means that our resource expects the client to send some input, namely TCF document. TCF format is XML, so we put "text/xml" in accepted media types. To read in that input we use java.io.File that we provided in the method parameter. java.io.File is just one of Java types for which JAX-RS supports marshalling. Using File vs. String, InputStream or Reader will allow us to receive and proccess big TCF files from the client requests.

The @Produces annotation means that our resource, after processing the client request, will send some output to the client. We output a TCF document with a new annotation layer added, therefore we again put "text/xml" media type. The output comes into the method return parameter value. We use String as return parameter, i.e. we will return TCF output as String, this String will be sent to the response output stream sent to the client. Another alternative, which would make it possible to output very large data, is to use StreamingOutput implementation and write directly to the response output stream. More about JAX-RS mappping of HTTP Response and Request Entity Bodies to specific Java types can be found in "RESTful Java with Jax-RS" by Bill Burke.

Inside the method we create Reader for reading the input stream from java.io.File object. This java.io.File object will be created from the client request and will be injected by the JAX-RS runtime. We also create StringWriter object that we will use to write the output of the named entity recognizer.

Next, we obtain an instance of Recognizer class and run it on our input producing the output. If our recognizer would be a real one, for example one based on a machine learning model, its initialization could be an expensive operation and the memory requirements could also be high. In that case it is better to initialize the heavy-weighted objects only once, so that only one (or a limited pool) of their instances serve all the clients. That's why in the sample we initialize the recognizer only the first time the service is called by a client. We assign it as an attribute to the servletContext and all the later calls will use an already existing instance of Recognizer from the context.

In case the named entity recognizer is not multithread safe, we would also want to make sure that only one client has access to it at any given time. That's why we use it inside a synchronized block.

Note that if your objects needed for serving client requests are light-weighted and you can afford creating separate instances of these objects for each client request, you don't need to use servletContext and don't need to worry about syncronization.

Additionally, we handle checked exceptions that may happen during processing by building a response containg HTTP error code and a message explaining the problem. We rethrow provided by JAX-RS unchecked exception WebAllicationException with the error response we built. WebLicht requires to use HTTP error codes to report an error and encourages to provide meaningful messages for the error source whenever possible.

Finally, we return String obtained from StringWriter that holds the output of named entity recognition. This String will be streamed to the client.


Test the service

To test the service with the help of NetBeans:

You can see that the token that has id "t1" ("Karin" token) was identified as PER (person) named entity, and the new layer, namedEntities, is added to the output TCF.

The test client provided by NetBeans is also good to test error responses. For example, our recognizer requires tokens layer in input. Removing the tokens layer from the input and clicking Test correctly produces Bad Request error (HTTP status 400) and error message pointing to the problem:


Prepare the service for deployment

To deploy your web application on the web server and make it available you need to package it as WAR file. WAR file is just a JAR file used to distribute a collection of JavaServer Pages, servlets, Java classes, XML files, tag libraries and static Web pages (HTML and related files) that together constitute a Web application. With NetBeans you can package your web application as WAR in the following way:


Adding your service to WebLicht

Once you have developed a service and deployed in on your server making it available, you will need to register it in the WebLicht repository, which will make it available in the WebLicht web application. Please contact the D-Spin development group at: weblicht at d-spin dot org for registering your service.

In case your service produces an annotation layer, that is not part of TCF specification, contact the D-Spin development group at weblicht at d-spin dot org, so that the new annotation layer could be added into the specification.


Recommended reading

1. "RESTful Java with Jax-RS" by Bill Burke

2. Java EE6 Tutorial Building RESTful Web Services with JAX-RS http://download.oracle.com/javaee/6/tutorial/doc/giepu.html