Last night, I played with CouchDB's replication features and came away very much impressed. Before moving on, I wanted to give the automated replication features a try.
There is no way to set this up from the Futon web interface, so I drop down to curl. I want to establish replication between whitefall (it's a small border planet moon) and jaynestown (where the legend of Jayne was born). Both have stock CouchDB 0.10.0 installs on Ubuntu 9.10. The curl command to get the replicating:
cstrom@whitefall:~$ curl -X POST http://localhost:5984/_replicate \That is the correct address, so what's up? Thankfully CouchDB is all HTTP so testing access is trivial:
-d '{"source":"eee", "target":"http://jaynestown.local:5984/eee-replica", "continuous":true}'
{"error":"db_not_found","reason":"could not open http://jaynestown.local:5984/eee-replica/"}
cstrom@whitefall:~$ telnet jaynestown.local 5984Ah, networking is order, but the CouchDB server on jaynestown is refusing access because it is listening only for localhost connections. To change that, I add this to
Trying 192.168.1.153...
telnet: Unable to connect to remote host: Connection refused
/etc/couchdb/local.ini
on jaynestown:[httpd]And then restart CouchDB on that server. Now when I issue the replicate directive:
;port = 5984
bind_address = 0.0.0.0
cstrom@whitefall:~$ curl -X POST http://localhost:5984/_replicate \I get this response:
-d '{"source":"eee", "target":"http://jaynestown.local:5984/eee-replica", "continuous":true}'
{"ok":true,"_local_id":"8cea26f1f5f59fa4d1adfbc99db8a1b0"}To try things out, I create a test document in which I will increment the number attribute as I edit the document:
cstrom@whitefall:~$ curl -X POST http://localhost:5984/eee -d '{"_id":"test-sequence","number":1}'Then I switch over to jaynestown and the document is already there!
{"ok":true,"id":"test-sequence","rev":"1-bb2537d8b2cb441f7bef451c103682e4"}
jaynestown% curl -X GET http://localhost:5984/eee-replica/test-sequenceThat is pretty freaking cool.
{"_id":"test-sequence","_rev":"1-bb2537d8b2cb441f7bef451c103682e4","number":1}
The log for this commit on whitefall shows:
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.8966.3>] 'POST' /eee {1,1}Checking the log on jaynestown:
Headers: [{'Accept',"*/*"},
{'Content-Length',"34"},
{'Content-Type',"application/x-www-form-urlencoded"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15"}]
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.8966.3>] OAuth Params: []
[Mon, 15 Feb 2010 02:05:23 GMT] [info] [<0.8966.3>] 127.0.0.1 - - 'POST' /eee 201
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.8974.3>] missing_revs updating committed seq to 5977
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.50.0>] New task status for 8cea26: eee -> http://jaynestown.local:5984/eee-replica/: W Processed source update #5977
[Mon, 15 Feb 2010 02:05:28 GMT] [info] [<0.8967.3>] recording a checkpoint at source update_seq 5977
[Mon, 15 Feb 2010 02:05:23 GMT] [info] [<0.872.0>] 192.168.1.150 - - 'POST' /eee-replica/_missing_revs 200Next, I shut down the network connection on jaynestown and update the sequence document on whitefall (the revision number is required for CouchDB updates as if optimistic locking were being used in a traditional RDBMS):
[Mon, 15 Feb 2010 02:05:23 GMT] [info] [<0.968.0>] 192.168.1.150 - - 'POST' /eee-replica/_bulk_docs 201
[Mon, 15 Feb 2010 02:05:28 GMT] [info] [<0.969.0>] 192.168.1.150 - - 'POST' /eee-replica/_ensure_full_commit 201
[Mon, 15 Feb 2010 02:05:28 GMT] [info] [<0.975.0>] 192.168.1.150 - - 'PUT' /eee-replica/_local%2F8cea26f1f5f59fa4d1adfbc99db8a1b0 201
cstrom@whitefall:~$ curl -X PUT http://localhost:5984/eee/test-sequence \Checking the whitefall log, I find:
-d '{"_rev":"1-bb2537d8b2cb441f7bef451c103682e4","number":2}'
{"ok":true,"id":"test-sequence","rev":"2-f7c422b92035df3ba4a9195162caefe7"}
[Mon, 15 Feb 2010 02:26:53 GMT] [debug] [<0.11203.3>] 'PUT' /eee/test-sequence {1,1}After restarting the network connection, I find:
Headers: [{'Accept',"*/*"},
{'Content-Length',"56"},
{'Content-Type',"application/x-www-form-urlencoded"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15"}]
[Mon, 15 Feb 2010 02:26:53 GMT] [debug] [<0.11203.3>] OAuth Params: []
[Mon, 15 Feb 2010 02:26:53 GMT] [info] [<0.11203.3>] 127.0.0.1 - - 'PUT' /eee/test-sequence 201
[Mon, 15 Feb 2010 02:26:58 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 0.5 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:27:03 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 1.0 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:28:36 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 64.0 seconds due to {error, conn_failed}Just to be sure, the updated sequence is now on jaynestown:
[Mon, 15 Feb 2010 02:29:39 GMT] [debug] [<0.8974.3>] missing_revs updating committed seq to 5978
[Mon, 15 Feb 2010 02:29:39 GMT] [debug] [<0.50.0>] New task status for 8cea26: eee -> http://jaynestown.local:5984/eee-replica/: W Processed source update #5978
[Mon, 15 Feb 2010 02:29:44 GMT] [info] [<0.8967.3>] recording a checkpoint at source update_seq 5978
jaynestown% curl http://localhost:5984/eee-replica/test-sequenceLastly, I will try a conflict. Again I disconnect the network on jaynestown. On whitefall, I update the sequence document:
{"_id":"test-sequence","_rev":"2-f7c422b92035df3ba4a9195162caefe7","number":2}
cstrom@whitefall:~$ curl -X PUT http://localhost:5984/eee/test-sequence \I do the same on jaynestown (while it is still disconnected), but with a conflicting number attribute:
-d '{"_rev":"2-f7c422b92035df3ba4a9195162caefe7","number":3}'
{"ok":true,"id":"test-sequence","rev":"3-14682b2da24cda8ca701f70ed4ccc441"}
jaynestown% curl -X PUT http://localhost:5984/eee-replica/test-sequence \Then I reconnect:
-d '{"_rev":"2-f7c422b92035df3ba4a9195162caefe7","number":4}'
{"ok":true,"id":"test-sequence","rev":"3-a0e4440ec7418788ce04cb06c579c11f"}
[Mon, 15 Feb 2010 02:38:10 GMT] [debug] [<0.11249.3>] 'PUT' /eee/test-sequence {1,1}I then check the document on whitefall:
Headers: [{'Accept',"*/*"},
{'Content-Length',"56"},
{'Content-Type',"application/x-www-form-urlencoded"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15"}]
[Mon, 15 Feb 2010 02:38:10 GMT] [debug] [<0.11249.3>] OAuth Params: []
[Mon, 15 Feb 2010 02:38:10 GMT] [info] [<0.11249.3>] 127.0.0.1 - - 'PUT' /eee/test-sequence 201
[Mon, 15 Feb 2010 02:38:15 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 0.5 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:38:21 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 1.0 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:38:27 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 2.0 seconds due to {error, conn_failed}
...
[Mon, 15 Feb 2010 02:39:53 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 64.0 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:40:56 GMT] [debug] [<0.8974.3>] missing_revs updating committed seq to 5979
[Mon, 15 Feb 2010 02:40:56 GMT] [debug] [<0.50.0>] New task status for 8cea26: eee -> http://jaynestown.local:5984/eee-replica/: W Processed source update #5979
[Mon, 15 Feb 2010 02:41:01 GMT] [info] [<0.8967.3>] recording a checkpoint at source update_seq 5979
cstrom@whitefall:~$ curl -X GET http://localhost:5984/eee/test-sequenceAnd on jaynestown I do the same:
{"_id":"test-sequence","_rev":"3-14682b2da24cda8ca701f70ed4ccc441","number":3}
jaynestown% curl -X GET http://localhost:5984/eee-replica/test-sequenceHunh, I had expected them to be synchronized now. Specifically, the jaynestown document should have won because it has a higher
{"_id":"test-sequence","_rev":"3-a0e4440ec7418788ce04cb06c579c11f","number":4}
_rev
number.What happens if I update the document on whitefall now?
cstrom@whitefall:~$ curl -X PUT http://localhost:5984/eee/test-sequence -d '{"_rev":"3-14682b2da24cda8ca701f70ed4ccc441","number":4}'Since jaynestown is still connected, replication takes place. Now on that machine I find:
{"ok":true,"id":"test-sequence","rev":"4-c349b24565da84f0571fe9c274252a6d"}
jaynestown% curl -X GET http://localhost:5984/eee-replica/test-sequenceThat is the document from whitefall (it has the same revision number as the most recent whitefall change).
{"_id":"test-sequence","_rev":"4-c349b24565da84f0571fe9c274252a6d","number":4}
There is a mechanism for finding conflicts in CouchDB. I create a temporary view on both machines:
function(doc) {Whitefall has no conflicts. Jaynestown has a conflict on
if(doc._conflicts) {
emit(doc._conflicts, null);
}
}
3-a0e4440ec7418788ce04cb06c579c11f
, which still seems wrong to me. The conflict should have been on 3-14682b2da24cda8ca701f70ed4ccc441
(the lower of the two values).I am not quite sure what is going on here, but I do not seem to be getting the eventual consistency that I expect with CouchDB. I will go defect hunting tomorrow and file a new one if there is not already an open ticket for this.
Day #14
the revs are meant to be opaque values. to really check eventual consistency you need more than 2 machines. the more the better.
ReplyDeletethen you create conflicts between 2 machines, replicate everything, and see that everyone sees the same doc value in views shows etc. now resolve the conflict on one node and see the resolution replicate.
I'm still not sure about this.
ReplyDeleteThe day before, I manually replicated with a conflict and both machines chose the same winning version of the document. With automated replication, each machine chose a different winning version -- the one that they had created. Why the difference?
I would expect that both machines would choose the same winner, but mark the document as having a conflict that could be resolved if need be. But to have both choose different versions seems problematic if, for example, they were being load balanced in the same cluster.
Am I not thinking about this right?
Replication is unidirectional. Set it up in both directions and you'll see that the doc is the same in both places.
ReplyDelete