japh(r) by Chris Strom: couch

Sunday, August 9, 2009

couch_docs 1.0

‹prev | My Chain | next›

Dumping and restoring CouchDB is almost mine. The only things known to be lacking in couch_docs are:

omitting the revision number from the dumped files (CouchDB tries to resolve a non-existent conflict with the revision number present)
attachments—restoring documents with stubs in them does not work

First up, an RSpec example describing the stripping of the revision attribute:

    it "should strip revision numbers" do
      @store.stub!(:map).
        and_return([{'_id' => 'foo', '_rev' => '1-1234'}])
      @dir.
        should_receive(:store_document).
        with({'_id' => 'foo'})

      CouchDocs.dump("uri", "fixtures")
    end

When that example is run, it fails because the revision number is still included:

1)
Spec::Mocks::MockExpectationError in 'CouchDocs dumping CouchDB documents to a directory should strip revision numbers'
Mock 'Document Directory' expected :store_document with ({"_id"=>"foo"}) but received it with ({"_rev"=>"1-1234", "_id"=>"foo"})
/home/cstrom/repos/couch_docs/lib/couch_docs.rb:58:in `dump'
/home/cstrom/repos/couch_docs/lib/couch_docs.rb:55:in `each'
/home/cstrom/repos/couch_docs/lib/couch_docs.rb:55:in `dump'
./spec/couch_docs_spec.rb:79:

As always, failure is a good thing—it means my example is testing what I think it ought to be testing. I can make that pass with a simple delete in the dump method:

  def self.dump(db_uri, dir)
    store = Store.new(db_uri)
    dir = DocumentDirectory.new(dir)
    store.
      map.
      reject { |doc| doc['_id'] =~ /^_design/ }.
      each   { |doc| doc.delete('_rev'); dir.store_document(doc) }
  end

Maybe I am letting Erlang influence me too much here, but that side-effect really bothers me. I let it slide here because there is no idiomatic way in Ruby to pass a copy of a modified hash. I could dup the hash, but nothing is driving me to do that. So I leave it, even though it bothers me.

As for the attachments, I only need to alter the get of each document from the database to include ?attachments=true. There is no need for a new spec, just a slight change to an existing example:

    it "should be able to load each document" do
      Store.stub!(:get).
        with("uri/_all_docs").
        and_return({ "total_rows" => 2,
                     "offset"     => 0,
                     "rows"       => [{"id"=>"1", "value"=>{}, "key"=>"1"},
                                      {"id"=>"2", "value"=>{}, "key"=>"2"}]})

      Store.stub!(:get).with("uri/1?attachments=true")
      Store.should_receive(:get).with("uri/2?attachments=true")

      @it.each { }
    end

Similarly, to get that example passing, only a slight change is needed in the code that iterates over each document in the CouchDB store:

    def each
      Store.get("#{url}/_all_docs")['rows'].each do |rec|
        yield Store.get("#{url}/#{rec['id']}?attachments=true")
      end
    end

That should do it.

It could be argued that I ought to create a spec that exercises the full CouchDB stack at this point. An example that starts with a JSON document in a seed directory, uses CouchDocs.upload_dir to store that document in a test CouchDB database, CouchDocs.dump to a separate directory, and finally compares the original document with the dumped copy to ensure that they are the same. I must confess laziness here. I use examples to drive clean implementation. That they provide some measure of regression testing is pure bonus for me. That said, I will be sure to add such a regression test the first time I introduce a bug in the future.

Before claiming completeness, I try the couch-docs scripts on my 1,000+ document database:

cstrom@jaynestown:~/repos/eee-code$ time couch-docs dump http://localhost:5984/eee couch/seed/

real    0m56.536s
user    0m7.048s
sys     0m0.516s

Wow, that is a significant increase over the 5 seconds it took to dump the documents without the attachments. I certainly expected an increase, but maybe not that much. I make a mental note of that, but optimization will come later (if it is becomes necessary).

Before restoring, I need a target database:

Now to test a CouchDB restore (again with timing):

cstrom@jaynestown:~/repos/eee-code$ time couch-docs load couch/seed/ http://localhost:5984/couch-docs-test

real    0m52.946s
user    0m5.068s
sys     0m0.476s

Well, it seems the 50 seconds for 1,000 documents is going to be typical.

Checking the database in the browser, I see that there are, indeed, documents:

And, clicking through to one document's attachments:

I update the README, History and the VERSION number in couch_docs.rb. Prior to publishing the code to Github, I update the gemspec, using the rake task from Bones:

cstrom@jaynestown:~/repos/couch_docs$ rake gem:spec          # Write the gemspec

Also from Bones, I create a tag for this version of the code:

cstrom@jaynestown:~/repos/couch_docs$ rake git:create_tag VERSION=1.0.0    # Create a new tag in the Git repository
(in /home/cstrom/repos/couch_docs)
Creating Git tag 'couch_docs-1.0.0'
Counting objects: 1, done.
Writing objects: 100% (1/1), 180 bytes, done.
Total 1 (delta 0), reused 0 (delta 0)
To git@github.com:eee-c/couch_docs.git
 * [new tag]         couch_docs-1.0.0 -> couch_docs-1.0.0

Github uses tags to create download files—they are not too useful for gems, but still nice to have.

With that, I am now able to load all of my seed data onto my new server in less than a minute. That will make it much easier to get started than populating this from my legacy Rails app.

(commit)

japh(r) by Chris Strom

Sunday, August 9, 2009

couch_docs 1.0

No comments:

Post a Comment