Named Entities Recognizer Service
From WebLichtWiki
Contents |
Introduction
This tutorial presents a workflow for creating a webservice for TCF processing. It imitates named entities recognizer service. The service processes POST requests containing TCF data with tokens. It processes these token annotations to produce named entity annotations.
This web-service imitates the case when its processing tool object requires requires loading a named entities list resource for identifying named entities. Since such resources can consume much memory and/or require much time when loading, the tool instance is created only once (the corresponding list resource is loaded only once), when the application is created. The example shows the case when the tool is not thread-safe, it requires synchronization.
Prerequisites
The tutorial assumes you have the following software installed:
- NetBeans IDE 7.2.1
- wget or curl command line tool (optional)
Adding Clarin Repository
The example WebLicht Service is provided as Maven Archetype stored in Clarin Repository. Therefore, you'll need to add Clarin Repository to your list of Maven Repositories. Skip this step if Clarin Repository is already among your Maven Repositories.
In NetBeans IDE, go to the list of Maven Repositories under the Services tab:
Right-click on "Maven Repositories" and select "Add Repository" option. Fill in the following information in the "Add Repository" window:
- Repository ID: clarin
- Repository Name: Clarin Repo
- Repository URL: http://catalog.clarin.eu/ds/nexus/content/repositories/Clarin/
Finish by pressing "Add"
Creating a Project from an Archetype
Once the Clarin Repository is accessible, we can start using the archetype at once. Press the "New Project" button in the menu bar and select: Maven -> Project From Archetype
In the next screen find and select "WebLicht NamedEntities Webservice Archetype"
Provide a name for your project, a directory to store it in as you would normally do with any NetBeans project. In addition, you have a possibility to provide a group name for your maven artifact and a package name you would like to use.
That's it! You have just created a WebLicht webservice.
Testing Webservices
To test the service, run it on your local server. Right-click on the project and select "Run" option. In the next screen select Tomcat server and click OK button.
The most straightforward way to test a webesrvice is to use wget or curl command line tool. For example, to POST to the service TCF data from "input.xml" and display the output of the service in the terminal window, run curl:
curl -H 'content-type: text/tcf+xml' -d @input.xml -X POST http://localhost:8080/mywlproject/annotate/
Or wget:
wget --post-file=input.xml --header='Content-Type: text/tcf+xml' http://localhost:8080/mywlproject/annotate/
Make sure you actually have in the current directory a file named "input.xml" in TCF0.4 format containing tokens layer. Such a file, provided for testing, is located under "Web Pages" in your project, just copy it to your current directory.
What's next?
Of course you would probably like to customize the provided code. Let's take a look at the files we have in the project:
- NamedEntitiesService.java
- Here, the application definition resides, use it to define the path to your application and/or add more resources. In this example, a resource NamedEntitiesResource is added as Singleton resource. It means that only one instance of the resource will be created for the application.
- NamedEntitiesResource.java
- This is the definition of a resource, in case more resources are required you can use it as a template for any further resources. (Don't forget to add them to the NamedEntitiesService.java)
- NamedEntitiesTool.java
- Here, an actual implementation of a tool resides. In this template an imitation of named entity recognizer is provided. In case you are writing a service wrapper for already existing tool, here is where you would call your tool, translating input/output data from/into TCF format. Here, the wlfxb library can be of a help, as used in this resource implementation.