‹prev |
My Chain |
next›Last night, I played with
CouchDB's replication features and came away very much impressed. Before moving on, I wanted to give the automated replication features a try.
There is no way to set this up from the Futon web interface, so I drop down to curl. I want to establish replication between whitefall (it's a small border planet moon) and jaynestown (where the legend of Jayne was born). Both have stock CouchDB 0.10.0 installs on Ubuntu 9.10. The curl command to get the replicating:
cstrom@whitefall:~$ curl -X POST http://localhost:5984/_replicate \
-d '{"source":"eee", "target":"http://jaynestown.local:5984/eee-replica", "continuous":true}'
{"error":"db_not_found","reason":"could not open http://jaynestown.local:5984/eee-replica/"}
That is the correct address, so what's up? Thankfully CouchDB is all HTTP so testing access is trivial:
cstrom@whitefall:~$ telnet jaynestown.local 5984
Trying 192.168.1.153...
telnet: Unable to connect to remote host: Connection refused
Ah, networking is order, but the CouchDB server on jaynestown is refusing access because it is listening only for localhost connections. To change that, I add this to
/etc/couchdb/local.ini
on jaynestown:
[httpd]
;port = 5984
bind_address = 0.0.0.0
And then restart CouchDB on that server. Now when I issue the replicate directive:
cstrom@whitefall:~$ curl -X POST http://localhost:5984/_replicate \
-d '{"source":"eee", "target":"http://jaynestown.local:5984/eee-replica", "continuous":true}'
I get this response:
{"ok":true,"_local_id":"8cea26f1f5f59fa4d1adfbc99db8a1b0"}
To try things out, I create a test document in which I will increment the number attribute as I edit the document:
cstrom@whitefall:~$ curl -X POST http://localhost:5984/eee -d '{"_id":"test-sequence","number":1}'
{"ok":true,"id":"test-sequence","rev":"1-bb2537d8b2cb441f7bef451c103682e4"}
Then I switch over to jaynestown and the document is already there!
jaynestown% curl -X GET http://localhost:5984/eee-replica/test-sequence
{"_id":"test-sequence","_rev":"1-bb2537d8b2cb441f7bef451c103682e4","number":1}
That is pretty freaking cool.
The log for this commit on whitefall shows:
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.8966.3>] 'POST' /eee {1,1}
Headers: [{'Accept',"*/*"},
{'Content-Length',"34"},
{'Content-Type',"application/x-www-form-urlencoded"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15"}]
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.8966.3>] OAuth Params: []
[Mon, 15 Feb 2010 02:05:23 GMT] [info] [<0.8966.3>] 127.0.0.1 - - 'POST' /eee 201
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.8974.3>] missing_revs updating committed seq to 5977
[Mon, 15 Feb 2010 02:05:23 GMT] [debug] [<0.50.0>] New task status for 8cea26: eee -> http://jaynestown.local:5984/eee-replica/: W Processed source update #5977
[Mon, 15 Feb 2010 02:05:28 GMT] [info] [<0.8967.3>] recording a checkpoint at source update_seq 5977
Checking the log on jaynestown:
[Mon, 15 Feb 2010 02:05:23 GMT] [info] [<0.872.0>] 192.168.1.150 - - 'POST' /eee-replica/_missing_revs 200
[Mon, 15 Feb 2010 02:05:23 GMT] [info] [<0.968.0>] 192.168.1.150 - - 'POST' /eee-replica/_bulk_docs 201
[Mon, 15 Feb 2010 02:05:28 GMT] [info] [<0.969.0>] 192.168.1.150 - - 'POST' /eee-replica/_ensure_full_commit 201
[Mon, 15 Feb 2010 02:05:28 GMT] [info] [<0.975.0>] 192.168.1.150 - - 'PUT' /eee-replica/_local%2F8cea26f1f5f59fa4d1adfbc99db8a1b0 201
Next, I shut down the network connection on jaynestown and update the sequence document on whitefall (the revision number is required for CouchDB updates as if optimistic locking were being used in a traditional RDBMS):
cstrom@whitefall:~$ curl -X PUT http://localhost:5984/eee/test-sequence \
-d '{"_rev":"1-bb2537d8b2cb441f7bef451c103682e4","number":2}'
{"ok":true,"id":"test-sequence","rev":"2-f7c422b92035df3ba4a9195162caefe7"}
Checking the whitefall log, I find:
[Mon, 15 Feb 2010 02:26:53 GMT] [debug] [<0.11203.3>] 'PUT' /eee/test-sequence {1,1}
Headers: [{'Accept',"*/*"},
{'Content-Length',"56"},
{'Content-Type',"application/x-www-form-urlencoded"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15"}]
[Mon, 15 Feb 2010 02:26:53 GMT] [debug] [<0.11203.3>] OAuth Params: []
[Mon, 15 Feb 2010 02:26:53 GMT] [info] [<0.11203.3>] 127.0.0.1 - - 'PUT' /eee/test-sequence 201
[Mon, 15 Feb 2010 02:26:58 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 0.5 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:27:03 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 1.0 seconds due to {error, conn_failed}
After restarting the network connection, I find:
[Mon, 15 Feb 2010 02:28:36 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 64.0 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:29:39 GMT] [debug] [<0.8974.3>] missing_revs updating committed seq to 5978
[Mon, 15 Feb 2010 02:29:39 GMT] [debug] [<0.50.0>] New task status for 8cea26: eee -> http://jaynestown.local:5984/eee-replica/: W Processed source update #5978
[Mon, 15 Feb 2010 02:29:44 GMT] [info] [<0.8967.3>] recording a checkpoint at source update_seq 5978
Just to be sure, the updated sequence is now on jaynestown:
jaynestown% curl http://localhost:5984/eee-replica/test-sequence
{"_id":"test-sequence","_rev":"2-f7c422b92035df3ba4a9195162caefe7","number":2}
Lastly, I will try a conflict. Again I disconnect the network on jaynestown. On whitefall, I update the sequence document:
cstrom@whitefall:~$ curl -X PUT http://localhost:5984/eee/test-sequence \
-d '{"_rev":"2-f7c422b92035df3ba4a9195162caefe7","number":3}'
{"ok":true,"id":"test-sequence","rev":"3-14682b2da24cda8ca701f70ed4ccc441"}
I do the same on jaynestown (while it is still disconnected), but with a conflicting number attribute:
jaynestown% curl -X PUT http://localhost:5984/eee-replica/test-sequence \
-d '{"_rev":"2-f7c422b92035df3ba4a9195162caefe7","number":4}'
{"ok":true,"id":"test-sequence","rev":"3-a0e4440ec7418788ce04cb06c579c11f"}
Then I reconnect:
[Mon, 15 Feb 2010 02:38:10 GMT] [debug] [<0.11249.3>] 'PUT' /eee/test-sequence {1,1}
Headers: [{'Accept',"*/*"},
{'Content-Length',"56"},
{'Content-Type',"application/x-www-form-urlencoded"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.19.5 (i486-pc-linux-gnu) libcurl/7.19.5 OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.15"}]
[Mon, 15 Feb 2010 02:38:10 GMT] [debug] [<0.11249.3>] OAuth Params: []
[Mon, 15 Feb 2010 02:38:10 GMT] [info] [<0.11249.3>] 127.0.0.1 - - 'PUT' /eee/test-sequence 201
[Mon, 15 Feb 2010 02:38:15 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 0.5 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:38:21 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 1.0 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:38:27 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 2.0 seconds due to {error, conn_failed}
...
[Mon, 15 Feb 2010 02:39:53 GMT] [debug] [<0.8975.3>] retrying couch_rep_httpc post request in 64.0 seconds due to {error, conn_failed}
[Mon, 15 Feb 2010 02:40:56 GMT] [debug] [<0.8974.3>] missing_revs updating committed seq to 5979
[Mon, 15 Feb 2010 02:40:56 GMT] [debug] [<0.50.0>] New task status for 8cea26: eee -> http://jaynestown.local:5984/eee-replica/: W Processed source update #5979
[Mon, 15 Feb 2010 02:41:01 GMT] [info] [<0.8967.3>] recording a checkpoint at source update_seq 5979
I then check the document on whitefall:
cstrom@whitefall:~$ curl -X GET http://localhost:5984/eee/test-sequence
{"_id":"test-sequence","_rev":"3-14682b2da24cda8ca701f70ed4ccc441","number":3}
And on jaynestown I do the same:
jaynestown% curl -X GET http://localhost:5984/eee-replica/test-sequence
{"_id":"test-sequence","_rev":"3-a0e4440ec7418788ce04cb06c579c11f","number":4}
Hunh, I had expected them to be synchronized now. Specifically, the jaynestown document should have won because it has a higher
_rev
number.
What happens if I update the document on whitefall now?
cstrom@whitefall:~$ curl -X PUT http://localhost:5984/eee/test-sequence -d '{"_rev":"3-14682b2da24cda8ca701f70ed4ccc441","number":4}'
{"ok":true,"id":"test-sequence","rev":"4-c349b24565da84f0571fe9c274252a6d"}
Since jaynestown is still connected, replication takes place. Now on that machine I find:
jaynestown% curl -X GET http://localhost:5984/eee-replica/test-sequence
{"_id":"test-sequence","_rev":"4-c349b24565da84f0571fe9c274252a6d","number":4}
That is the document from whitefall (it has the same revision number as the most recent whitefall change).
There is a mechanism for finding conflicts in CouchDB. I create a temporary view on both machines:
function(doc) {
if(doc._conflicts) {
emit(doc._conflicts, null);
}
}
Whitefall has no conflicts. Jaynestown has a conflict on
3-a0e4440ec7418788ce04cb06c579c11f
, which still seems wrong to me. The conflict should have been on
3-14682b2da24cda8ca701f70ed4ccc441
(the lower of the two values).
I am not quite sure what is going on here, but I do not seem to be getting the eventual consistency that I expect with CouchDB. I will go defect hunting tomorrow and file a new one if there is not already an open ticket for this.
Day #14