Thursday, March 4, 2010

Fully Operational CouchDocs Design Documents dump/load

‹prev | My Chain | next›


I was not quite able to mark off another item on my couch_docs
1.1 TODO list:
  • Better command line experience.
    • Should default to current directory.
    • Should print help without args / better format
  • Should use the bulk docs
  • Should support the !json and !code macros from couchapp
  • Should support a flag to only work on design docs (mostly for export).
  • Should create the DB if it doesn't already exist
I am able to dump design documents, but....

The dump is too slow for large databases and fails for couchapp design documents.

The dump is slow for large databases because I iterate over every document in the DB—even when I want only the design documents. Ultimately, I have the feeling that iterating over every document will lessen the usefulness of this gem for others. It takes ~2 minutes to dump 1,000 large (10kb) documents to file system. This is sufficient for my needs, but I would imagine to be unacceptable for those with millions of documents. Ah well, something for version 1.2. For now, I will do what is needed to speed up design document dumps.

The easiest way to do that is by iterating over only design documents. All CouchDB design documents have an ID that begins with _design. Exploiting this, one can request only design documents with judicious use of startkey/endkey:
cstrom@whitefall:~/repos/couch_docs$ curl http://localhost:5984/eee/_all_docs\?startkey=\"_design\"\&endkey=\"_design0\"
The question is, how to get those query parameters into my document store iterator:
    def each
Store.get("#{url}/_all_docs")['rows'].each do |rec|
yield Store.get("#{url}/#{rec['id']}?attachments=true")
The simplest way is to calculate the "all docs" URL that is passed to Store.get:
    def each
all_url = "#{url}/_all_docs" +
(design_docs_only ? '?startkey=%22_design%22&endkey=%22_design0%22' : "")

Store.get(all_url)['rows'].each do |rec|
yield Store.get("#{url}/#{rec['id']}?attachments=true")
An optional second argument to the store's constructor sets that boolean attribute. After testing out the command on my test database, I try it on my real database only to hit my last issue (but much faster!):
cstrom@whitefall:~/tmp/dump$ couch-docs dump http://localhost:5984/eee -d
/home/cstrom/.gem/ruby/1.8/gems/couch_docs-1.1.0/lib/couch_docs/design_directory.rb:62:in `initialize': No such file or directory - ./_design/re
lax/couchapp/signatures/vendor/couchapp/jquery.couchapp.js.js (Errno::ENOENT)
from /home/cstrom/.gem/ruby/1.8/bin/couch-docs:19:in `load'
from /home/cstrom/.gem/ruby/1.8/bin/couch-docs:19
Ah, the problem here is that the key is "vendor/couchapp/jquery.couchapp.js". When I try to save that to the file system, the directory store tries to put it in the non-existent "vendor/couchapp" sub-directory.

The solution is to encode the slashes before saving. The easiest place to do this in couch_docs is in the save_js method, which takes the current key as the second argument. In RSpec format, an example of the desired behavior:
    it "should store the attributes with slashes to the filesystem" do
with("/tmp/_design/foo/bar%2Fbaz.js", "w+")

@it.save_js("_design/foo", "bar/baz", "json")
To make this example pass, I use a simple gsub:
    def save_js(rel_path, key, value)
if value.is_a? Hash
# code recurse down further in the Hash
path = couch_view_dir + '/' + rel_path

file ="#{path}/#{key.gsub(/\//, '%2F')}.js", "w+")
With that, I am done. A quick check and I find that what once took 2 minutes and failed is now working and completes in less than 2 seconds
cstrom@whitefall:~/tmp/dump$ time couch-docs dump http://localhost:5984/eee -d

real 0m1.593s
user 0m1.056s
sys 0m0.152s
Finally, another item crossed off the checklist.

Day #32

No comments:

Post a Comment