I started worrying about indexing whole documents as soon as I got couchdb-lucene working with my prototype database. By default, couchdb-lucene indexes attributes individually. This means that the index can be searched for recipes whose titles contain the word "chilled", but there is no way to search for documents that contain the word "chilled" anywhere.
That nagging concern has remained in the intervening weeks, so I was quite excited to see an example of how to do this on Robert Newson's fork of couchdb-lucene.
To get that up and running on my local copy of CouchDB 0.9, I need to update my local copy and rebuild:
cd ~/repos/couchdb-luceneTo point my local copy of CouchDB to the new version of the lucene indexer, I update
git pull
mvn
etc/couchdb/local_dev.ini
:; CouchDB Configuration SettingsTo choose a different indexing algorithm, a new design document is needed. In futon, choose "Design documents" from the "Select view" drop down:
; Custom settings should be made in this file. They will override settings
; in default.ini, but unlike changes made to default.ini, this file won't be
; overwritten on server upgrade.
[couchdb]
;max_document_size = 4294967296 ; bytes
[httpd]
; port = 5985
;bind_address = 127.0.0.1
[log]
; level = debug
[update_notification]
;unique notifier name=/full/path/to/exe -with "cmd line arg"
[couchdb]
os_process_timeout=60000 ; increase the timeout from 5 seconds.
[external]
fti=/usr/bin/java -jar /home/cstrom/repos/couchdb-lucene/target/couchdb-lucene-0.3-SNAPSHOT-jar-with-dependencies.jar -search
[update_notification]
indexer=/usr/bin/java -jar /home/cstrom/repos/couchdb-lucene/target/couchdb-lucene-0.3-SNAPSHOT-jar-with-dependencies.jar -index
[httpd_db_handlers]
_fti = {couch_httpd_external, handle_external_req, <<"fti">>}
Next, choose "Create Document ..." from the top of the UI, and enter "_design/lucene" for the name:
Finally add the following code (from the couchdb-lucene documentation), which indexes the entire document in the
all
field:function(doc) {To a new transform attribute (make sure to enclose it in quotes):
var ret = new Document();
function idx(obj) {
for (var key in obj) {
switch (typeof obj[key]) {
case 'object':
idx(obj[key]);
break;
case 'function':
break;
default:
ret.field(key, obj[key]);
ret.field('all', obj[key]);
break;
}
}
}
idx(doc);
return ret;
}
Finally, make sure to save the document! I always seem to forget this step which causes all sorts of confusion.
To test the full document,
all
search, use curl:cstrom@jaynestown:~$ curl http://localhost:5984/eee/_fti?q=all:wheatberriesJust to be sure that something has not broken after the change, I check the same search on the instructions field, which pulls back the same result ("wheatberries" was mentioned in the instructions of the oatmeal recipe):
{"q":"+_db:eee +all:wheatberries",
"etag":"1202191c3d1",
"skip":0,
"limit":25,
"total_rows":1,
"search_duration":1,
"fetch_duration":0,
"rows":[{"_id":"2008-07-19-oatmeal",
"score":0.5710114240646362}]}
cstrom@jaynestown:~$ curl http://localhost:5984/eee/_fti?q=instructions:wheatberries
{"q":"+_db:eee +instructions:wheatberries",
"etag":"1202191c3d1",
"skip":0,
"limit":25,
"total_rows":1,
"search_duration":1,
"fetch_duration":1,
"rows":[{"_id":"2008-07-19-oatmeal",
"score":0.6242526769638062}]}
How did this work without having to specify the design document name?
ReplyDeleteCouchDB documents do not need names -- just an _id ("_design/lucene" in this case).
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteA Note for couchdb-lucene-0.4: After some hours of rookie-trial and error, I found a combination of design-docs and queries, that worked for me:
ReplyDelete_design/lucene:
{
"_id": "_design/lucene",
"_rev": "9-1003076217",
"transform": "function(doc) { var ret = new Document(); function idx(obj) { for (var key in obj) { switch (typeof obj[key]) { case 'object': idx(obj[key]); break; case 'function': break; default: ret.field(key, obj[key]); ret.field('all', obj[key]); break; } } } idx(doc); return ret; }",
"fulltext": {
"by_title": {
"defaults": {
"store": "yes"
},
"index": "function(doc) { var ret=new Document(); ret.add(doc.title); return ret }"
},
"by_description": {
"defaults": {
"store": "no"
},
"index": "function(doc) { var ret=new Document(); ret.add(doc.description); return ret }"
}
}
}
A Query:
curl http://127.0.0.1:5984/notes_development/_fti/lucene/by_title?q=pop*
here is example for newer version couchdb-lucene
ReplyDeletehttp://iphylo.blogspot.com/2010/11/couchdb-and-lucene.html