Thursday, March 19, 2009

Full Text Indexing of CouchDB with Lucene

‹prev | My Chain | next›

Having gotten couchdb-lucene and edge CouchDB installed and running, I'll keep my chain going by trying to get indexing and searching to work.

I am running in local development environment (./utils/run), so I need to edit the etc/couchdb/local_dev.ini to include:
os_process_timeout=60000 ; increase the timeout from 5 seconds.

fti=/usr/bin/java -jar /home/cstrom/repos/couchdb-lucene/target/couchdb-lucene-SNAPSHOT-jar-with-dependencies.jar -search

indexer=/usr/bin/java -jar /home/cstrom/repos/couchdb-lucene/target/couchdb-lucene-SNAPSHOT-jar-with-dependencies.jar -index

_fti = {couch_httpd_external, handle_external_req, <<"fti">>}
The next step is to start up the CouchDB server:
cstrom@jaynestown:~/repos/couchdb$ ./utils/run 
Apache CouchDB 0.9.0a756286 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.58.0>] - - 'GET' /_all_dbs 200
[info] [<0.58.0>] - - 'GET' /eee/_design/lucene 404
[info] [<0.58.0>] - - 'GET' /eee 200
To verify that the index is working, you can access the _fti resource of the database:
cstrom@jaynestown:~/repos/couchdb-lucene/target$ curl http://localhost:5984/eee/_fti
Nice! I do have 7 documents in there, so we look to be in good shape.

To search, append a q query parameter to the request with a value in the form attribute_name:search term. We like our greens, so, to search for all recipes (in our limited sample) that include a word starting with "green" in the summary, you would supply the search term: q=summary:green*.

Giving it a try, I find that we do indeed have 2 recipes mentioning "greens":
cstrom@jaynestown:~/repos/couchdb-lucene/target$ curl http://localhost:5984/eee/_fti?q=summary:green*
"rows":[{"_id":"2006-10-08-dressing", "score":0.9224791526794434},
Aside from the yak shaving needed to get edge CouchDB running, this was by far the easiest experience I have ever had in getting Lucene indexing running.

I would ultimately like to be able to search an entire document, not just individual fields, but this will do for now.

1 comment:

  1. This comment has been removed by the author.