Sunday, May 10, 2009

Deliberate CouchDB Views

‹prev | My Chain | next›

I was able to get CouchDB map-reduce working yesterday. Even though it is working, I am not sure that I fully understand why it is working. I want to avoid programming by coincidence, so I am taking some time to step back and understand my tool of choice.

To recap, I want a list of every ingredient in all recipes, pointing to each recipe in which they are used:

I was able to accomplish this via CouchDB map-reduces. In temporary view form it looks like the following:

In design document form, this looks like:
"_id": "_design/recipes",
"_rev": "16-2289953674",
"views": {
"by_ingredients": {
"map": "function (doc) { for (var i in doc['preparations']) {
var ingredient = doc['preparations'][i]['ingredient']['name'];
var value = [doc['_id'], doc['title']];
emit(ingredient, value); }}",
"reduce": "function(keys, values, rereduce) { return values; }"
"language": "javascript"

But when I access the view via HTTP, I get this:
cstrom@jaynestown:~/repos/eee-code$ curl "http://localhost:5984/eee/_design/recipes/_view/by_ingredients"
{"key":null,"value":[[["2006-06-17-fish","Green Chutney Covered Fish"],
["2006-06-17-shrimp","Curried Shrimp"],
["2007-01-15-soup","Crockpot Lentil Andouille Soup"],
CouchDB params are well documented, and I probably just need to read that documentation a little closer. For instance, the reason I am getting the above is because of:
If a view contains both a map and reduce function, querying that view will by default return the result of the reduce function. The result of the map function only may be retrieved by passing reduce=false as a query parameter.
Indeed there are both a map and reduce, so I must be getting the reduce above (though not the one that I want). Passing reduce=false does get me the non-reduced results (the mapped results) that I was getting yesterday before adding the reduce:
cstrom@jaynestown:~/repos/eee-code$ curl "http://localhost:5984/eee/_design/recipes/_view/by_ingredients?reduce=false"
{"id":"2008-07-21-spinach","key":"artichoke hearts","value":["2008-07-21-spinach","Spinach and Artichoke Pie"]},
{"id":"2008-07-19-oatmeal","key":"barley","value":["2008-07-19-oatmeal","Multi-grain Oatmeal"]},
{"id":"2006-10-08-dressing","key":"black pepper","value":["2006-10-08-dressing","Mustard Vinaigrette"]},
{"id":"2008-07-19-oatmeal","key":"brown sugar","value":["2008-07-19-oatmeal","Multi-grain Oatmeal"]},
{"id":"2002-01-13-hollandaise_sauce","key":"butter","value":["2002-01-13-hollandaise_sauce","Hollandaise Sauce"]},
{"id":"2006-07-26-fish","key":"butter","value":["2006-07-26-fish","Pan-Fried Fish with Potato Crust"]},
{"id":"2006-06-17-shrimp","key":"cardamom pod","value":["2006-06-17-shrimp","Curried Shrimp"]},
{"id":"2007-01-15-soup","key":"celery","value":["2007-01-15-soup","Crockpot Lentil Andouille Soup"]},
{"id":"2006-06-17-fish","key":"cilantro","value":["2006-06-17-fish","Green Chutney Covered Fish"]},
{"id":"2006-06-17-raita","key":"cilantro","value":["2006-06-17-raita","Yogurt Raita"]},
{"id":"2006-06-17-shrimp","key":"cinnamon","value":["2006-06-17-shrimp","Curried Shrimp"]},
{"id":"2008-07-19-oatmeal","key":"cinnamon","value":["2008-07-19-oatmeal","Multi-grain Oatmeal"]},
So what was that first result? That is not that reduced set that I want. I want the reduced set that I saw above in futon. Again, actually reading the documentation, I find:
Keep in mind that the the Futon Web-Client silently adds group=true to your views
In desperation last night I added the group=true option in order to get the results I desired, but did not fully understand. At the very least, I am not relying on undocumented behavior that is likely to change in a future CouchDB release. For that alone, I feel much better.

But what does group=true actually do? According to the documentation:
The group option controls whether the reduce function reduces to a set of distinct keys or to a single result row.
It would seem that group=false is the default. When I access it with group=true, I get my desired results (as I did last night):
cstrom@jaynestown:~/repos/eee-code$ curl "http://localhost:5984/eee/_design/recipes/_view/by_ingredients?group=true"
{"key":"artichoke hearts","value":[["2008-07-21-spinach","Spinach and Artichoke Pie"]]},
{"key":"barley","value":[["2008-07-19-oatmeal","Multi-grain Oatmeal"]]},
{"key":"black pepper","value":[["2006-10-08-dressing","Mustard Vinaigrette"]]},
{"key":"brown sugar","value":[["2008-07-19-oatmeal","Multi-grain Oatmeal"]]},
{"key":"butter","value":[["2006-07-26-fish","Pan-Fried Fish with Potato Crust"],["2002-01-13-hollandaise_sauce","Hollandaise Sauce"]]},
{"key":"cardamom pod","value":[["2006-06-17-shrimp","Curried Shrimp"]]},
{"key":"celery","value":[["2007-01-15-soup","Crockpot Lentil Andouille Soup"]]},
{"key":"cilantro","value":[["2006-06-17-raita","Yogurt Raita"],["2006-06-17-fish","Green Chutney Covered Fish"]]},
{"key":"cinnamon","value":[["2008-07-19-oatmeal","Multi-grain Oatmeal"],["2006-06-17-shrimp","Curried Shrimp"]]},
I understand this well enough to proceed. I am still not sure why the default is not to group the results. Why would I want a single result row? It certainly does not provide useful results in the case where I reduce to a list of recipes. Perhaps it proves useful in the case when reducing to a count or when making use of the rereduce parameter to the map function. Maybe, but why make it the default?

Something to learn another day. For now, I am satisfied that I can build on my current understanding. It should prove more than adequate for the meal / blog work that is coming up this week.

No comments:

Post a Comment