When you search for documents containing the word "chocolate" with Google, you enter "chocolate" as the search term. When use Google to find documents containing the word "chocolate" on a particular site, say http://eeecooks.com, you would enter "site:eeecooks.com chocolate".
Because this is how Google works, this is how search works.
But this is not how the current seach in eee-code works. To search for a recipe with "chocolate" in it and a title that contains "pancake", I currently have to query couchdb-lucene with a search of "title:pancake all:chocolate". Yesterday, I started down the path of trying to pre-process the search query. Today, I think better of it.
Lucene's QueryParser supports a default field argument in its constructor. If we supply "all" as the default field, which is possible in couchdb-lucene in
src/main/java/com/github/rnewson/couchdb/lucene/Config.java
:static final QueryParser QP = new QueryParser("all", ANALYZER);Then the QueryParser interprets "title:pancake chocolate" to be identical to "title:pancake all:chocolate".
Just to be sure, give curl a try with the old standby of "wheatberries" (and "all:wheatberries"):
cstrom@jaynestown:~/repos/eee-code$ curl http://localhost:5984/eee/_fti?q=all:wheatberriesNote that both queries are both interpreted as "+_db:eee +all:wheatberri"—both use the "all" field to scope the the search even though the second does not explicitly include it.
{"q":"+_db:eee +all:wheatberri",
"etag":"120c60536a7",
"skip":0,
"limit":25,
"total_rows":1,
"search_duration":1,
"fetch_duration":1,
"rows":[
{"_id":"2008-07-19-oatmeal",
"date":"2008/07/19",
"title":"Multi-grain Oatmeal",
"score":0.5710114240646362
}]
}
cstrom@jaynestown:~/repos/eee-code$ curl http://localhost:5984/eee/_fti?q=wheatberries
{"q":"+_db:eee +all:wheatberri",
"etag":"120c60536a7",
"skip":0,"limit":25,
"total_rows":1,
"search_duration":0,
"fetch_duration":1,
"rows":[
{"_id":"2008-07-19-oatmeal",
"date":"2008/07/19",
"title":"Multi-grain Oatmeal",
"score":0.5710114240646362
}]
}
Also of note is that "wheatberri" is the Porter stem of "wheatberries" (this stemming was explicitly set a few days ago). The "_db" field is how couchdb-lucene works with multiple databases. All documents from all databases (e.g. the recipe documents in the development and test databases) are all stored in the same index. Couchdb-lucene automatically infers the db parameter from the database being queried ("eee" in the above examples). Using this parameter, couchdb-lucene only searches for documents in the current database, effectively limiting search even though the search index is not similarly limited.
(commit)
With that in place, I can back out the workaround from yesterday, leaving the search action much simpler:
get '/recipes/search' doNext up: searching ingredients and then onto paginating and sorting (which couchdb-lucene supports out of the box).
data = RestClient.get "#{@@db}/_fti?q=#{params[:q]}"
@results = JSON.parse(data)
haml :search
end
(commit)
I'll be providing clear semantics for a default field in 0.3.
ReplyDeleteOoh! Thanks for pointing that out. From the 0.3 TODO there will be a "defaults" attribute on the design document that will be able to do this—and more!
ReplyDeleteAlready looking forward to it. And thanks so much for your work. It's made things *much* easier for me!