Today I would like to play with CouchDB replication on some VirtualBox VMs. First up, I clone a bunch of VMs:
cstrom@whitefall:~/.VirtualBox/HardDisks$ VBoxManage clonehd couch-0.11-base.vdi couch-0.11a.vdiI then add them to the VirtualBox Virtual Media Manager:
VirtualBox Command Line Management Interface Version 3.0.8_OSE
(C) 2005-2009 Sun Microsystems, Inc.
All rights reserved.
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Clone hard disk created in format 'VDI'. UUID: aaa53dee-9d51-4eba-b56e-de8eacee9708
cstrom@whitefall:~/.VirtualBox/HardDisks$ VBoxManage clonehd couch-0.11-base.vdi couch-0.11b.vdi
VirtualBox Command Line Management Interface Version 3.0.8_OSE
(C) 2005-2009 Sun Microsystems, Inc.
All rights reserved.
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Clone hard disk created in format 'VDI'. UUID: aafb5319-8364-42dd-8ec3-a12a1235bd15
...
cstrom@whitefall:~/.VirtualBox/HardDisks$ VBoxManage clonehd couch-0.11-base.vdi couch-0.11i.vdi
VirtualBox Command Line Management Interface Version 3.0.8_OSE
(C) 2005-2009 Sun Microsystems, Inc.
All rights reserved.
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Clone hard disk created in format 'VDI'. UUID: 6cae9857-1353-4570-b518-ec8ac3a79a86
I can then create VMs from the cloned hard drives. Unfortunately, for each VM, I will need to set a different hostname (if I want to rely on avahi hostnames). I fire up a VM, change the hostname and check connectivity only to find that there is none. In fact, there isn't even a network interface:
It took me a bit to recall, but I have run into this problem when cloning VMs in the past. The problem is that the cloned VMs are assigned a new network MAC address, but the udev rules are specific to the VM from which the clones were made. To get around this, I edit
/etc/udev/rules.d/70-persistent-net.rules
such that the ATTR{address}
attribute matches a wildcard MAC:SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="08:00:27:*", ATTR{dev_id}=="0x0", ATTR{type}=="1", KERNEL=="eth*", NAME="eth0"I also delete any entry for
NAME="eth1"
or above.After fixing that and editing the hostname on VMs a-f, I have 6 running VMs with CouchDB 0.11. To test out replication, I need a common database, so I use couch_docs to create and populate a database:
cstrom@whitefall:~/tmp/seed$ couch-docs push http://couch-011a.local:5984/test . -RBefore doing the same on b-f, I check Futon on couch-011a:
Updating documents on CouchDB Server...
Yup, the DB is really there. I create and populate the test database on couch-011b all the way through couch-011f:
cstrom@whitefall:~/tmp/seed$ couch-docs push http://couch-011b.local:5984/test . -RNow it is time to replicate. The easiest replication that I can think of for 6 servers is a round-robin:
Updating documents on CouchDB Server...
cstrom@whitefall:~/tmp/seed$ couch-docs push http://couch-011c.local:5984/test . -R
Updating documents on CouchDB Server...
cstrom@whitefall:~/tmp/seed$ couch-docs push http://couch-011d.local:5984/test . -R
Updating documents on CouchDB Server...
cstrom@whitefall:~/tmp/seed$ couch-docs push http://couch-011e.local:5984/test . -R
Updating documents on CouchDB Server...
cstrom@whitefall:~/tmp/seed$ couch-docs push http://couch-011f.local:5984/test . -R
Updating documents on CouchDB Server...
+-----+To accomplish that, I need to POST to the
+----->| a |------+
| +-----+ |
| v
+-----+ +-----+
| f | | b |
+-----+ +-----+
^ |
| v
+-----+ +-----+
| e | | c |
+-----+ +-----+
^ |
| +-----+ |
+------| d |<-----+
+-----+
_replicate
resource on each server with the source database and the target database. Using curl
looks like:cstrom@whitefall:~/tmp/seed$ curl -X POST http://couch-011a.local:5984/_replicate \I tell couch-011a to replicate its test database to the test database on couch-011b. I tell couch-011b to replicate its test database to the test database on couch-011c, and so on until couch-011f, which I tell to replicate back to couch-011a. That should be what I expect in the diagram. Now to test...
> -d '{"source":"test", "target":"http://couch-011b.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"5cd41a28a497587c2853e0b9dc8acd01"}
cstrom@whitefall:~/tmp/seed$ curl -X POST http://couch-011b.local:5984/_replicate \
> -d '{"source":"test", "target":"http://couch-011c.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"fdd443d544821e2a905e30fb1f6fa6a3"}
cstrom@whitefall:~/tmp/seed$ curl -X POST http://couch-011c.local:5984/_replicate \
> -d '{"source":"test", "target":"http://couch-011d.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"a0540360cc03c1f51647bd46514216e8"}
cstrom@whitefall:~/tmp/seed$ curl -X POST http://couch-011d.local:5984/_replicate \
> -d '{"source":"test", "target":"http://couch-011e.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"fae56e64fa4c8b0ada968c5d4861304a"}
cstrom@whitefall:~/tmp/seed$ curl -X POST http://couch-011e.local:5984/_replicate \
> -d '{"source":"test", "target":"http://couch-011f.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"dd061f67c1d9eb6de8b10d472764b0c6"}
cstrom@whitefall:~/tmp/seed$ curl -X POST http://couch-011f.local:5984/_replicate \
> -d '{"source":"test", "target":"http://couch-011a.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"f6b8bf78a7c7e542f071abeb7ff5293d"}
I create a new, empty directory, and create a single JSON file in it (optimistically named)
to_be_replicated.json
:cstrom@whitefall:~/tmp/seed2$ echo '{"foo":"bar"}' > to_be_replicated.jsonI then push this to the couch-011b server using couch_docs:
cstrom@whitefall:~/tmp/seed2$ couch-docs push http://couch-011b.local:5984/test . -wAs expected, this file is now visible in Futon on couch-011b:
Updating documents on CouchDB Server...
But how about couch-011a?
Yup! It made it all the way around the circuit.
So what happens when two of the servers go down and updates are made? The last time I tried this, I messed up because I confused replication with synchronization. Replication in CouchDB, even automatic replication, is unidirectional. Today I am still unidirectional, but it is a closed loop. Any conflicts I create while the circuit is broken should ultimately get resolved when the circuit is restored. So let's test...
I manually stop couch-011c and couch-011e. Then I push conflicting changes to couch-011a and couch-011d:
cstrom@whitefall:~/tmp/seed2$ echo '{"foo":"bob"}' > to_be_replicated.jsonThe server after couch-011a is still online, so the couch-011a change gets replicated to couch-011b, but no further. Server couch-011a, couch-011b, and couch-011d are now in conflict:
cstrom@whitefall:~/tmp/seed2$ couch-docs push http://couch-011a.local:5984/test .
Updating documents on CouchDB Server...
cstrom@whitefall:~/tmp/seed2$ echo '{"foo":"bar"}' > to_be_replicated.json
cstrom@whitefall:~/tmp/seed2$ couch-docs push http://couch-011d.local:5984/test .
Updating documents on CouchDB Server...
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011d.local:5984/test/to_be_replicatedThe server after couch-011e in the circuit has seen none of the changes and is still at the original version of the doc (as evidenced by the "1" at the start of the revision):
{"_id":"to_be_replicated","_rev":"5-d2c8433606378e67445d1455713b6f93","foo":"bar"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011b.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"5-01dee71e62dbf26d07613511e6d2cd14","foo":"bob"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011f.local:5984/test/to_be_replicatedSo what happens when I start the couch-011c and couch-011e servers back up? Well, nothing:
{"_id":"to_be_replicated","_rev":"1-f0ce1cb7c380b09ebd91c5829a9f7f40","foo":"bar"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011b.local:5984/test/to_be_replicatedServer b and d still conflict and server f still has the old document. This is because automatic replication does not stay in place between CouchDB restarts. So I need to redo the replication statements for c & e:
{"_id":"to_be_replicated","_rev":"5-01dee71e62dbf26d07613511e6d2cd14","foo":"bob"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011d.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"5-d2c8433606378e67445d1455713b6f93","foo":"bar"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011f.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"1-f0ce1cb7c380b09ebd91c5829a9f7f40","foo":"bob"}
cstrom@whitefall:~/tmp/seed2$ curl -X POST http://couch-011c.local:5984/_replicate \With that, I should have the same document on each server:
> -d '{"source":"test", "target":"http://couch-011d.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"a0540360cc03c1f51647bd46514216e8"}
cstrom@whitefall:~/tmp/seed2$ curl -X POST http://couch-011e.local:5984/_replicate \
> -d '{"source":"test", "target":"http://couch-011f.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"dd061f67c1d9eb6de8b10d472764b0c6"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011b.local:5984/test/to_be_replicatedBah! What's up with that?
{"_id":"to_be_replicated","_rev":"5-01dee71e62dbf26d07613511e6d2cd14","foo":"bob"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011d.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"5-d2c8433606378e67445d1455713b6f93","foo":"bar"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011f.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"1-f0ce1cb7c380b09ebd91c5829a9f7f40","foo":"bob"}
It turns out that replication was disabled in the servers that were trying to reach couch-011c and couch-011e while they were down (the log appeared to indicate that this happened after 10 failed replication attempts). So, I need to re-enable replication on server b and d as well:
cstrom@whitefall:~/tmp/seed2$ curl -X POST http://couch-011b.local:5984/_replicate \With that, I finally have the same document on each CouchDB database in the circuit:
> -d '{"source":"test", "target":"http://couch-011c.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"fdd443d544821e2a905e30fb1f6fa6a3"}
cstrom@whitefall:~/tmp/seed2$ curl -X POST http://couch-011d.local:5984/_replicate -d '{"source":"test", "target":"http://couch-011e.local:5984/test", "continuous":true}'
{"ok":true,"_local_id":"fae56e64fa4c8b0ada968c5d4861304a"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011b.local:5984/test/to_be_replicatedHow CouchDB chooses the "winning" version of the document is meant to be opaque and if it chooses the wrong one, a conflict resolution view is available. I'm just happy to see this working as expected this time around.
{"_id":"to_be_replicated","_rev":"5-d2c8433606378e67445d1455713b6f93","foo":"bar"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011d.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"5-d2c8433606378e67445d1455713b6f93","foo":"bar"}
cstrom@whitefall:~/tmp/seed2$ curl http://couch-011f.local:5984/test/to_be_replicated
{"_id":"to_be_replicated","_rev":"5-d2c8433606378e67445d1455713b6f93","foo":"bar"}
Day #48
Great writeup as usual :) I'm pretty sure we'll make the 'what to do after a failure' behaviour more tuneable.
ReplyDeleteCheers
Jan
--