Search Library for Helma Object Publisher


Table of Contents

Requirements
Creating a new index
Mounting an existing index
The Document Object
Adding and removing Document objects
Queries
Boolean Query
TermQuery
PhraseQuery
RangeQuery
FuzzyQuery
PrefixQuery
WildcardQuery
Searching
Displaying the results of a search
Optimizing indexes

The Search Library is basically a Javascript wrapper around Apache Lucene, a ?high-performance, full-featured text search engine library written entirely in Java?, for use in Helma Object Publisher applications. To use the Library library you need to meet the following requirements:

  • Helma Object Publisher version 1.4 or greater

  • The lucene .jar file (tested with versions >= 1.4.3) must be present either in lib/ext or in the application directory (depending if you need the search functionality server-wide or in just one application).

Before you can use the library you must instantiate it by calling the constructor function:

var engine = new helma.Search();

Creating a new index is done using a method called createIndex(). So, assuming that you instantiated the library as above, the call would look like this:

Object createIndex(String name,
                   Object dir,
                   Object analyzer);

The following example shows how to create a new index using a german analyzer:

In case something goes wrong (eg. you passed too less or wrong arguments or because of inadequate file permissions) this method will throw an error, so you should enclose it in a try/catch block.

The Library has a second method called mountIndex() which you can use to mount an already existing index on disk. The syntax is:

Object mountIndex(String name,
                  Object dir,
                  Object analyzer);

The arguments are the same as for createIndex().

If there is no index in the directory /usr/local/helma/index/stories this method will throw an error, so you might want to enclose the above call in a try/catch block, and create a new index on-the-fly.

Since you wouldn't want to mount the index every time you want to access it, you can safely store the resulting index object somewhere in app.data:

app.data.index = new Object();
app.data.index.stories = index;

This documentation will use app.data.index.stories as example.

One of the things you will most likely want to do with the index is to add documents to it. These documents will be the ones you can search for. To create a new document object call the following constructor:

var doc = new helma.Search.Document();

The document constructor doesn't take any arguments (well, to be honest you can pass an instance of org.apache.lucene.Document to it, but you rarely will need this). After having done that you can start to "populate" the document object with fields containing values that should be part of the index. So the most used method of Document is

void addField(String name,
              Object value,
              Object param);

Lucene offers many different ways to query an index. The one you will most likely use is the BooleanQuery.

Once you've created a query object, you'll have to instantiate a Searcher object that will do the actual search:

var searcher = new app.data.index.Searcher();

This object offers three methods: search(), sortBy() and close(). The simplest way to do a search looks like this:

Closing the searcher is very important, otherwise you'll accumulate resources that will most likely lock up your index. You can also add a query filter to your search, which will only display those results that match the filter:

try {
    var searcher = new app.data.index.Searcher();
    var filterQuery = new helma.Search.BooleanQuery("site", 2);
    var filter = new helma.Search.QueryFilter(filterQuery);
    // the result of the seach will only contain documents whose field "site" contains "2"
    var hits = searcher.search(query, filter);
    // loop over searcher.hits to display them ...
    searcher.close();
} catch (ex) {
    throw ex;
} finally {
    searcher.close();
}

Using sortBy() you can sort the results of a query.

var searcher = new app.data.index.Searcher();
searcher.sortBy("created", "INT", true);

void sortBy(String fieldName);

void sortBy(String fieldName,
            String type);

void sortBy(String fieldName,
            Boolean reverse);

void sortBy(String fieldName,
            String type,
            Boolean reverse);

The following strings are allowed type arguments: AUTO, CUSTOM, DOC, FIELD_DOC, FIELD_SCORE, FLOAT, INT, SCORE, STRING.

All of the above queries will result in a collection stored in the property hits of the Searcher object. But this collection isn't an ordinary Javascript Array as you might expect, it is a Javascript Object containing two methods: get() and length() (instead of the latter you can call size()). So looping over the results of a search is pretty easy:

var hit;
var max = searcher.hits.length();
for (var i=0;i<max;i++) {
   hit = searcher.hits.get(i);
}

Every hit in the above example contains an instance of helma.Search.Document with an additional property named score which contains a number between 0 and 1 (1 means 100%). Depending on how you created those document objects you can now either display the values directly or retrieve a value from the document object that works as a pointer to the object stored in the database. In any case you will need the method getField() from helma.Search.Document:Object getField(String name);

The return value of this method is a Javascript-Object containing the following properties:

So let's say you did one of the above queries and already have a result. To display the titles of all stories found simply do the following:

var hit;
var max = searcher.hits.length();
for (var i=0;i<max;i++) {
   hit = searcher.hits.get(i);
   res.write("found story ");
   res.write(hit.getField("title").value);
   res.writeln("<br />");
}

But don't forget to close the searcher when your're finished. There's one special thing with Date values inside a Lucene index: you can't retrieve them with getField() since then you would get the internal value of the Date object in the index. Instead, use getDateField():

var createtime = hit.getDateField("createtime");

There's a third method for field retrieval: getFields() will return an Array containing all Fields of a Document. Again, to be able to call this method For performance reasons the resulting collection of Lucene Document objects (instances of org.apache.lucene.document.Document) isn't converted into a Javascript Array containing instances of helma.Search.Document. This should be done by your application code if necessary (as in the following example):

var hit;
var max = searcher.hits.length();
for (var i=0;i<max;i++) {
   hit = searcher.hits.get(i);
   fields = hit.getFields();
   for (var j in fields) {
      res.write(fields[j].name + ": " + fields[j].value);
      res.writeln("<br />");
   }
}

Depending on how fast your index changes you should optimize it from time to time. This is fairly simple, just use (assuming that the index is stored in app.data.index):

app.data.index.optimize();

Be aware of the fact that depending on the size of the index, its fragmentation, the I/O speed of the machine and its amount of memory this can take quite some time, although Lucene is really fast in doing this.