References Identifier Service

From WebLichtWiki

(Difference between revisions)
Jump to: navigation, search
(What's next?)
(Testing Webservices)
 
Line 69: Line 69:
  
  
<nowiki> curl -H 'content-type: text/tcf+xml' -d @input.xml -X POST http://localhost:8080/mywlproject/annotate/</nowiki>
+
<nowiki> curl -H 'content-type: text/tcf+xml' --data-binary @input.xml -X POST "http://localhost:8080/mywlproject/annotate/"</nowiki>
  
  
 
Or wget:
 
Or wget:
  
<nowiki> wget --post-file=input.xml --header='Content-Type: text/tcf+xml' http://localhost:8080/mywlproject/annotate/</nowiki>
+
<nowiki> wget --post-file=input.xml --header='Content-Type: text/tcf+xml' "http://localhost:8080/mywlproject/annotate/"</nowiki>
  
  

Latest revision as of 09:30, 23 August 2013

Contents

Introduction

This tutorial presents a workflow for creating a webservice for TCF processing. It imitates reference identifier service. The service processes POST requests containing TCF data with tokens, part-of-speech and named entity annotation layers. It uses processes these annotations to produce reference annotations.


This web-service imitates the case when its processing tool object requires the model for identifying the references. Since a model can consume much memory and/or require much time when loading, the tool instance is created only once (the corresponding model is loaded only once), when the application is created. The example shows the case when the tool is thread-safe, it can be shared among the clients without any synchronization.

Prerequisites

The tutorial assumes you have the following software installed:

Adding Clarin Repository

The example WebLicht Service is provided as Maven Archetype stored in Clarin Repository. Therefore, you'll need to add Clarin Repository to your list of Maven Repositories. Skip this step if Clarin Repository is already among your Maven Repositories.


In NetBeans IDE, go to the list of Maven Repositories under the Services tab:

Maven-repositories-service.png

Right-click on "Maven Repositories" and select "Add Repository" option. Fill in the following information in the "Add Repository" window:

Finish by pressing "Add"

Adding-clarin-repository.png


Creating a Project from an Archetype

Once the Clarin Repository is accessible, we can start using the archetype at once. Press the "New Project" button in the menu bar and select: Maven -> Project From Archetype

New-project-from-archetype.png

In the next screen find and select "WebLicht References Webservice Archetype"

Select-archetype-references.png

Provide a name for your project, a directory to store it in as you would normally do with any NetBeans project. In addition, you have a possibility to provide a group name for your maven artifact and a package name you would like to use.

Project-name-location.png

That's it! You have just created a WebLicht webservice.

References-project-created.png

Testing Webservices

To test the service, run it on your local server. Right-click on the project and select "Run" option. In the next screen select Tomcat server and click OK button.

Localserver-deploy.png


The most straightforward way to test a webesrvice is to use wget or curl command line tool. For example, to POST to the service TCF data from "input.xml" and display the output of the service in the terminal window, run curl:


curl -H 'content-type: text/tcf+xml' --data-binary @input.xml -X POST "http://localhost:8080/mywlproject/annotate/"


Or wget:

wget --post-file=input.xml --header='Content-Type: text/tcf+xml' "http://localhost:8080/mywlproject/annotate/"


Make sure you actually have in the current directory a file named "input.xml" in TCF0.4 format containing tokens, part-of-speech and named entity annotation layers. Such a file, provided for testing, is located under "Web Pages" in your project, just copy it to your current directory.

What's next?

Of course you would probably like to customize the provided code. Let's take a look at the files we have in the project:


  • ReferencesService.java


Here, the application definition resides, use it to define the path to your application and/or add more resources. In this example, a resource ReferencesResource is added as Singleton resource. It means that only one instance of the resource will be created for the application.


  • ReferencesResource.java


This is the definition of a resource, in case more resources are required you can use it as a template for any further resources (don't forget to add them to the ReferencesService.java). Since the resource is registered as a singleton resource, only one its instance is created per application. The resource initializes a TextCorpusProcessor tool used for processing (in this case ReferencesTool object) in its constructor, so that only one instance of the tool is created per application as well. This is useful when the tool used for processing consumes much memory and/or requires much time when loading. Annotated with @POST resource method processes client requests containing TCF input and sends response to the clients with the TCF output. For that, it initializes TextCorpusStreamed object requesting the layers of interest and uses the ReferencesTool object to identify the references and create reference annotations in TCF. It also takes care about catching exceptions and sending the HTTP error code with short cause message in case an exception occurs during the processing.


  • ReferencesTool.java


Here, an actual implementation of a tool resides. In this template an imitation of reference detector is provided. In case you are writing a web service wrapper for already existing tool, here is where you would call your tool, translating input/output data from/into TCF format. Here, the wlfxb library can be of a help, as used in this resource implementation. In this example, the tool loads an imitation of a model in its constructor method. The tool provides process() method that takes TCF document with the layers of interest, uses the loaded model to identify the references in the document, and adds the identified references as a new annotation layer to the TCF document. This example imitates the thread-safe implementation of the tool. It means that client requests can share the same tool objects and no synchronization is required to call the tool process() method.