Sunday, August 9, 2009

couch_docs 1.0

‹prev | My Chain | next›

Dumping and restoring CouchDB is almost mine. The only things known to be lacking in couch_docs are:
  • omitting the revision number from the dumped files (CouchDB tries to resolve a non-existent conflict with the revision number present)
  • attachments—restoring documents with stubs in them does not work
First up, an RSpec example describing the stripping of the revision attribute:
    it "should strip revision numbers" do
and_return([{'_id' => 'foo', '_rev' => '1-1234'}])
with({'_id' => 'foo'})

CouchDocs.dump("uri", "fixtures")
When that example is run, it fails because the revision number is still included:
Spec::Mocks::MockExpectationError in 'CouchDocs dumping CouchDB documents to a directory should strip revision numbers'
Mock 'Document Directory' expected :store_document with ({"_id"=>"foo"}) but received it with ({"_rev"=>"1-1234", "_id"=>"foo"})
/home/cstrom/repos/couch_docs/lib/couch_docs.rb:58:in `dump'
/home/cstrom/repos/couch_docs/lib/couch_docs.rb:55:in `each'
/home/cstrom/repos/couch_docs/lib/couch_docs.rb:55:in `dump'
As always, failure is a good thing—it means my example is testing what I think it ought to be testing. I can make that pass with a simple delete in the dump method:
  def self.dump(db_uri, dir)
store =
dir =
reject { |doc| doc['_id'] =~ /^_design/ }.
each { |doc| doc.delete('_rev'); dir.store_document(doc) }
Maybe I am letting Erlang influence me too much here, but that side-effect really bothers me. I let it slide here because there is no idiomatic way in Ruby to pass a copy of a modified hash. I could dup the hash, but nothing is driving me to do that. So I leave it, even though it bothers me.

As for the attachments, I only need to alter the get of each document from the database to include ?attachments=true. There is no need for a new spec, just a slight change to an existing example:
    it "should be able to load each document" do
and_return({ "total_rows" => 2,
"offset" => 0,
"rows" => [{"id"=>"1", "value"=>{}, "key"=>"1"},
{"id"=>"2", "value"=>{}, "key"=>"2"}]})


@it.each { }
Similarly, to get that example passing, only a slight change is needed in the code that iterates over each document in the CouchDB store:
    def each
Store.get("#{url}/_all_docs")['rows'].each do |rec|
yield Store.get("#{url}/#{rec['id']}?attachments=true")
That should do it.

It could be argued that I ought to create a spec that exercises the full CouchDB stack at this point. An example that starts with a JSON document in a seed directory, uses CouchDocs.upload_dir to store that document in a test CouchDB database, CouchDocs.dump to a separate directory, and finally compares the original document with the dumped copy to ensure that they are the same. I must confess laziness here. I use examples to drive clean implementation. That they provide some measure of regression testing is pure bonus for me. That said, I will be sure to add such a regression test the first time I introduce a bug in the future.

Before claiming completeness, I try the couch-docs scripts on my 1,000+ document database:
cstrom@jaynestown:~/repos/eee-code$ time couch-docs dump http://localhost:5984/eee couch/seed/

real 0m56.536s
user 0m7.048s
sys 0m0.516s
Wow, that is a significant increase over the 5 seconds it took to dump the documents without the attachments. I certainly expected an increase, but maybe not that much. I make a mental note of that, but optimization will come later (if it is becomes necessary).

Before restoring, I need a target database:

Now to test a CouchDB restore (again with timing):
cstrom@jaynestown:~/repos/eee-code$ time couch-docs load couch/seed/ http://localhost:5984/couch-docs-test

real 0m52.946s
user 0m5.068s
sys 0m0.476s
Well, it seems the 50 seconds for 1,000 documents is going to be typical.

Checking the database in the browser, I see that there are, indeed, documents:

And, clicking through to one document's attachments:

I update the README, History and the VERSION number in couch_docs.rb. Prior to publishing the code to Github, I update the gemspec, using the rake task from Bones:
cstrom@jaynestown:~/repos/couch_docs$ rake gem:spec          # Write the gemspec
Also from Bones, I create a tag for this version of the code:
cstrom@jaynestown:~/repos/couch_docs$ rake git:create_tag VERSION=1.0.0    # Create a new tag in the Git repository
(in /home/cstrom/repos/couch_docs)
Creating Git tag 'couch_docs-1.0.0'
Counting objects: 1, done.
Writing objects: 100% (1/1), 180 bytes, done.
Total 1 (delta 0), reused 0 (delta 0)
* [new tag] couch_docs-1.0.0 -> couch_docs-1.0.0
Github uses tags to create download files—they are not too useful for gems, but still nice to have.

With that, I am now able to load all of my seed data onto my new server in less than a minute. That will make it much easier to get started than populating this from my legacy Rails app.


No comments:

Post a Comment