Next up in my scenarios is
Matching a word stem in the recipe instructions
. Word stems reduce words to their lowest common denominator so that searching for the word "whisk" will match documents containing the word "whisking".The entire scenario:
Scenario: Matching a word stem in the recipe instructionsAs with the last scenario, there are relatively few steps that need to be implemented anew. The
Given a "pancake" recipe with instructions "mixing together dry ingredients"
And a "french toast" recipe with instructions "whisking the eggs"
And a 1 second wait to allow the search index to be updated
When I search for "whisk"
Then I should not see the "pancake" recipe in the search results
And I should see the "french toast" recipe in the search results
Given
a recipes with instructions step can be implemented thusly:Given /^a "(.+)" recipe with instructions "(.+)"$/ do |title, instructions|This is really starting to look familiar. My red-green-refactor cycle may need a little more refactor. Another day.
date = Date.new(2009, 4, 16)
permalink = "id-#{title.gsub(/\W/, '-')}"
recipe = {
:title => title,
:date => date,
:instructions => instructions
}
RestClient.put "#{@@db}/#{permalink}",
recipe.to_json,
:content_type => 'application/json'
end
With that in place, I have but one failure remaining:
Feature: Search for recipesThis failure shows that no recipes are showing up in the search results, which means that stemming is not being used in couchdb-lucene. Inspecting
So that I can find one recipe among many
As a web user
I want to be able search recipes
Scenario: Matching a word stem in the recipe instructions
Given a "pancake" recipe with instructions "mixing together dry ingredients"
And a "french toast" recipe with instructions "whisking the eggs"
And a 1 second wait to allow the search index to be updated
When I search for "whisk"
Then I should not see the "pancake" recipe in the search results
And I should see the "french toast" recipe in the search results
expected following output to contain a <a href='/recipes/id-french toast'>french toast</a> tag:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><table><tr>
<th>Name</th>
<th>Date</th>
</tr></table></body></html> (Spec::Expectations::ExpectationNotMetError)
./features/step_definitions/recipe_search.rb:82:in `And /^I should see the "(.+)" recipe in the search results$/'
features/recipe_search.feature:32:in `And I should see the "french toast" recipe in the search results'
1 scenario
5 steps passed
1 step failed
src/main/java/com/github/rnewson/couchdb/lucene/Config.java
, one can see that it uses the (non-stemming) StandardAnalyzer
:...To get it using using a custom (stemming) analyzer, create
final class Config {
static final Analyzer ANALYZER = new StandardAnalyzer();
...
}
src/main/java/com/github/rnewson/couchdb/lucene/MyAnalyzer.java
:package com.github.rnewson.couchdb.lucene;There is nothing fancy in there—it is take directly from the lucene documentation. Then, change the configuration to use
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.LowerCaseTokenizer;
import org.apache.lucene.analysis.PorterStemFilter;
import java.io.Reader;
class MyAnalyzer extends Analyzer {
public final TokenStream tokenStream(String fieldName, Reader reader) {
return new PorterStemFilter(new LowerCaseTokenizer(reader));
}
}
MyAnalyzer
:...Finally compile the
final class Config {
static final Analyzer ANALYZER = new MyAnalyzer();
...
}
jar
files with maven by invoking mvn
. My local development version of CouchDB is already pointing to the compiled jar, so all I need to is start it up with ./utils/run
and re-run cucumber:cstrom@jaynestown:~/repos/eee-code$ cucumber features/recipe_search.feature -n -s "Matching a word stem in the recipe instructions"Hunh?! The french toast recipe (that requires "whisking") is now showing up in the search results, why is it failing?
Feature: Search for recipes
So that I can find one recipe among many
As a web user
I want to be able search recipes
Scenario: Matching a word stem in the recipe instructions
Given a "pancake" recipe with instructions "mixing together dry ingredients"
And a "french toast" recipe with instructions "whisking the eggs"
And a 1 second wait to allow the search index to be updated
When I search for "whisk"
Then I should not see the "pancake" recipe in the search results
And I should see the "french toast" recipe in the search results
expected following output to contain a <a href='/recipes/id-french toast'>french toast</a> tag:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><table>
<tr>
<th>Name</th>
<th>Date</th>
</tr>
<tr class="row0">
<td>
<a href="/recipes/id-french-toast">french toast</a>
</td>
<td>2009-04-16</td>
</tr>
</table></body></html> (Spec::Expectations::ExpectationNotMetError)
./features/step_definitions/recipe_search.rb:82:in `And /^I should see the "(.+)" recipe in the search results$/'
features/recipe_search.feature:32:in `And I should see the "french toast" recipe in the search results'
1 scenario
5 steps passed
1 step failed
Ah nuts, the link being tested for is missing a dash. Add a
gsub
to the step:Then /^I should see the "(.+)" recipe in the search results$/ do |title|And we have verified stemming working!
response.should have_selector("a",
:href => "/recipes/id-#{title.gsub(/\W/, '-')}",
:content => title)
end
cstrom@jaynestown:~/repos/eee-code$ cucumber features/recipe_search.feature -n -s "Matching a word stem in the recipe instructions"
Feature: Search for recipes
So that I can find one recipe among many
As a web user
I want to be able search recipes
Scenario: Matching a word stem in the recipe instructions
Given a "pancake" recipe with instructions "mixing together dry ingredients"
And a "french toast" recipe with instructions "whisking the eggs"
And a 1 second wait to allow the search index to be updated
When I search for "whisk"
Then I should not see the "pancake" recipe in the search results
And I should see the "french toast" recipe in the search results
1 scenario
6 steps passed
(commit)
Update: I forked couchdb-lucene so that I could continue to use the stemming analyzer, while still tacking changes to the master.
No comments:
Post a Comment