Table of Contents
The Search Library is basically a Javascript wrapper around Apache Lucene, a ?high-performance, full-featured text search engine library written entirely in Java?, for use in Helma Object Publisher applications. To use the Library library you need to meet the following requirements:
Helma Object Publisher version 1.4 or greater
The lucene .jar file (tested with versions >= 1.4.3) must
be present either in lib/ext or in the
application directory (depending if you need the search
functionality server-wide or in just one application).
Before you can use the library you must instantiate it by calling the constructor function:
var engine = new helma.Search();
Creating a new index is done using a method called
createIndex(). So, assuming that you instantiated
the library as above, the call would look like this:
Object createIndex(String name,
Object dir,
Object analyzer);
The name to use for the index (eg. "stories"). This will
result in a directory beneath the second argument
dir, or - if not given - beneath the
application directory.
A File object representing the base directory where you want
the index to reside. This argument can either be an instance of
helma.File, File or
java.io.File. If the directory doesn't
exist, it will be created. If this argument is omitted the index
will be created in the subdirectory "index" located in the
installation directory of Helma Object Publisher.
An instance of
org.apache.lucene.analysis.Analyzer to be
used during adding to or retrieving documents from the index.
Depending on the language of the documents indexed you can choose
between a variety of analyzers (see Lucene documentation for an
in-deep explanation of how analyzers influence indexing and
searching). Using the library's static method
helma.Search.getAnalyzer(lang) you can
retrieve such an analyzer object, currently supported arguments
are de (german analyzer),
ru (russian), si or
simple (simple analyzer) and
whitespace (whitespace analyzer). If you
don't specify any analyzer when creating or mounting an index, it
will utilize Lucene's standard analyzer.
The following example shows how to create a new index using a german analyzer:
Example 1. Creating a new index
var engine = new helma.Search();
var dir = new helma.File("/usr/local/helma/index/");
var analyzer = helma.Search.getAnalyzer("de");
var index = engine.createIndex("stories", dir, analyzer);In case something goes wrong (eg. you passed too less or wrong arguments or because of inadequate file permissions) this method will throw an error, so you should enclose it in a try/catch block.
If there is already an index stored in the directory you passed to createIndex it will be erased. To avoid this use Search.mountIndex() which is described in the next section (in most applications you will most likely try to mount the index, and if this fails create a new one).
The Library has a second method called
mountIndex() which you can use to mount an
already existing index on disk. The syntax is:
Object mountIndex(String name,
Object dir,
Object analyzer);
The arguments are the same as for
createIndex().
Example 2. Mounting an existing index
var engine = new helma.Search();
var dir = new helma.File("/usr/local/helma/index/");
var analyzer = helma.Search.getAnalyzer("de");
var index = engine.mountIndex("stories", dir, analyzer);If there is no index in the directory /usr/local/helma/index/stories this method will throw an error, so you might want to enclose the above call in a try/catch block, and create a new index on-the-fly.
Since you wouldn't want to mount the index every time you want to access it, you can safely store the resulting index object somewhere in app.data:
app.data.index = new Object(); app.data.index.stories = index;
This documentation will use
app.data.index.stories as example.
One of the things you will most likely want to do with the index is to add documents to it. These documents will be the ones you can search for. To create a new document object call the following constructor:
var doc = new helma.Search.Document();
The document constructor doesn't take any arguments (well, to be
honest you can pass an instance of
org.apache.lucene.Document to it, but you rarely
will need this). After having done that you can start to "populate" the
document object with fields containing values that should be part of the
index. So the most used method of Document is
void addField(String name,
Object value,
Object param);
Name of the field
Value of the field
(optional) A Javascript Object containing the following properties:
(Boolean) defines whether to store the value in the index (default: true)
(Boolean) if value should be indexed set this property to true (default: true)
(Boolean) true if value should be tokenized (default: true)
Example 3. Adding Fields to a Document
var doc = new helma.Search.Document();
doc.addField("title", "my first indexed story", {store: false, index: true, tokenize: true});Although Lucene handles date values differently, you can use the same method addField() to add a Date object to the index:
doc.addField("created", new Date());Every index object has a method called
addDocument() that will do the job of adding a
Document object to the index.
Example 4. Adding a Document Object
var doc = new helma.Search.Document();
doc.addField("title", "my first indexed story", {store: false, index: true, tokenize: true});
app.data.index.stories.addDocument(doc);The above example assumes that the index object is stored in
app.data.index.There's one speciality with
addDocument(): you can not only pass one single
instance of helma.Search.Document to it but also
a HashTable or Vector containing several Document objects. This is
useful for e.g. rebuilding an index where you first collect all the
document objects and then populate the index with one single call of
addDocument().
Removing a document object is easy as well:
app.data.index.stories.removeDocument("id", 1234);will remove all documents in the index whose "id" field has the value 1234 (so you might want to make sure you call this method only with arguments that you can be sure of will be unique within the index).
Lucene offers many different ways to query an index. The one you will most likely use is the BooleanQuery.
This query form gives you the possibility to add several different terms using specific operators (AND, OR, NOT). Let's say you want to search for documents in you index that contain "my" or "story" in their title. What you simply have to do is the following:
var query = new helma.Search.BooleanQuery("title", "my");
query.addTerm("title", "story", "or");Since "or" is the default operator, you can even make this simpler:
var query = new helma.Search.BooleanQuery("title", "my story");But lets say you want to search for all documents that contain the words "my" or "story" in either the title or the text. This is done by passing an Array containing field names to the Query constructor:
var query = new helma.Search.BooleanQuery(["title", "text"], "my story");
You can also combine different queries within one BooleanQuery:
var query = new helma.Search.BooleanQuery(); query.addQuery(new Search.WildcardQuery(["title", "text"], "sto*"); query.addQuery(new Search.FuzzyQuery(["title", "text"], "sto~");
For debugging purposes use the toString() method of the query objects, which will give you the exact query string that will be used by Lucene.
var query = new helma.Search.TermQuery("number", 1);
var query = new helma.Search.TermQuery("string", "two");var query = new helma.Search.RangeQuery("number", "3", "7", true);The third argument defines whether the minimum/maximum values should be included or not.
Once you've created a query object, you'll have to instantiate a Searcher object that will do the actual search:
var searcher = new app.data.index.Searcher();
This object offers three methods: search(),
sortBy() and close().
The simplest way to do a search looks like this:
Example 5. Doing a search
try {
var searcher = new app.data.index.stories.Searcher();
var hits = searcher.search(query);
} catch (ex) {
throw ex;
} finally {
searcher.close();
}Closing the searcher is very important, otherwise you'll accumulate resources that will most likely lock up your index. You can also add a query filter to your search, which will only display those results that match the filter:
try {
var searcher = new app.data.index.Searcher();
var filterQuery = new helma.Search.BooleanQuery("site", 2);
var filter = new helma.Search.QueryFilter(filterQuery);
// the result of the seach will only contain documents whose field "site" contains "2"
var hits = searcher.search(query, filter);
// loop over searcher.hits to display them ...
searcher.close();
} catch (ex) {
throw ex;
} finally {
searcher.close();
}Using sortBy() you can sort the results of
a query.
var searcher = new app.data.index.Searcher();
searcher.sortBy("created", "INT", true);void sortBy(String fieldName);
void sortBy(String fieldName,
String type);
void sortBy(String fieldName,
Boolean reverse);
void sortBy(String fieldName,
String type,
Boolean reverse);
The following strings are allowed type arguments:
AUTO, CUSTOM,
DOC, FIELD_DOC,
FIELD_SCORE, FLOAT,
INT, SCORE,
STRING.
All of the above queries will result in a collection stored in the
property hits of the Searcher object. But this
collection isn't an ordinary Javascript Array as you might expect, it is a
Javascript Object containing two methods: get()
and length() (instead of the latter you can call
size()). So looping over the results of a search
is pretty easy:
var hit;
var max = searcher.hits.length();
for (var i=0;i<max;i++) {
hit = searcher.hits.get(i);
}Every hit in the above example contains an
instance of helma.Search.Document with an additional property named
score which contains a number between 0 and 1 (1 means
100%). Depending on how you created those document objects you can now
either display the values directly or retrieve a value from the document
object that works as a pointer to the object stored in the database. In
any case you will need the method getField() from
helma.Search.Document:Object getField(String name);
The return value of this method is a Javascript-Object containing the following properties:
The name of the Field
an integer containing the boost factor of this Field
Boolean true if the Field is indexed
Boolean true if the value of the Field is stored in the index
Boolean true if the Field is tokenized
The string value of the Field.
So let's say you did one of the above queries and already have a result. To display the titles of all stories found simply do the following:
var hit;
var max = searcher.hits.length();
for (var i=0;i<max;i++) {
hit = searcher.hits.get(i);
res.write("found story ");
res.write(hit.getField("title").value);
res.writeln("<br />");
}But don't forget to close the searcher when your're finished.
There's one special thing with Date values inside a Lucene index: you
can't retrieve them with getField() since then
you would get the internal value of the Date object in the index. Instead,
use getDateField():
var createtime = hit.getDateField("createtime");There's a third method for field retrieval:
getFields() will return an Array containing all
Fields of a Document. Again, to be able to call this method For
performance reasons the resulting collection of Lucene Document objects
(instances of org.apache.lucene.document.Document)
isn't converted into a Javascript Array containing instances of
helma.Search.Document. This should be done by your application code if
necessary (as in the following example):
var hit;
var max = searcher.hits.length();
for (var i=0;i<max;i++) {
hit = searcher.hits.get(i);
fields = hit.getFields();
for (var j in fields) {
res.write(fields[j].name + ": " + fields[j].value);
res.writeln("<br />");
}
}Depending on how fast your index changes you should optimize it from time to time. This is fairly simple, just use (assuming that the index is stored in app.data.index):
app.data.index.optimize();
Be aware of the fact that depending on the size of the index, its fragmentation, the I/O speed of the machine and its amount of memory this can take quite some time, although Lucene is really fast in doing this.