Thursday, September 24, 2009

Full Stack ETag Support

‹prev | My Chain | next›

This is how I have am using Rack::Cache, Sinatra, and CouchDB:
             1. Web client       ^                        
Request | 5. Respond to client
| | and
| | Store in
+---+-------------------+-+ /-----> file system
| | Rack::Cache +-+--- cache
| |
| |
| Sinatra |
| +----------+
+----+ | |
| +-------------------------+ |
| ^ \
| 2. RestClient | 3. Response |
| Request | _rev: 1234 |
| | |
| +----------------+--------+ \
+--->| | |
| CouchDB |<-----------+
| | 4. Ancillary
| | Requests
The nice thing about this stack is that it is all web-based, which will allow me to make certain assumptions when optimizing.

Yesterday, I was able to by-pass step #4 in that diagram which should cut down significantly on the total request time. I used the _rev (revision) attribute returned from CouchDB in step #3 as the argument to Sinatra's etag method. Rack::Cache, in turn, uses that value to decide whether it can use a previously stored cached copy of the HTML generated by Sinatra from the assembled bits of several CouchDB requests.

The action in question:
get '/recipes/:permalink' do
data = RestClient.get "#{@@db}/#{params[:permalink]}"
@recipe = JSON.parse(data)
etag @recipe['_rev']

url = "#{@@db}/_design/recipes/_view/by_date_short"
data = RestClient.get url
@recipes_by_date = JSON.parse(data)['rows']

@url = request.url

haml :recipe
If the _rev of the CouchDB recipe document matches the ETag of the HTML document stored in cached, all other processing stops and the cached copy is immediately returned. If they do not match, a new document is generated.

As of yesterday, with the code above, I have that working. What I would like to accomplish today is skipping step #2. In the case that the web browser already has a cached copy of the web page (and hence knows the HTML document's ETag), why bother requesting the entire document from CouchDB? As long as the ETag and CouchDB _rev match, the request life cycle should stay very close to the top of that diagram.

In order to make that happen, I need the RestClient call at the start of the Sinatra action to supply the If-None-Match HTTP request header attribute that corresponds to the ETag response header attribute.

RestClient supports request attributes via optional second argument to the get method. To tell CouchDB to only return a recipe if has been updated, I can use this form:
>> RestClient.get 'http://localhost:5984/eee/2001-09-02-potatoes', :if_none_match =>  "2-2471836896"
=> "{"_id":"2001-09-02-potatoes",
"title":"Roasted Potatoes"
Hmmm... Well, at least I think I should be able to use that form. I am not sure what is going wrong there, but the entire document is being returned. After other troubleshooting fails, I drop down to packet sniffing with tcpdump:
jaynestown% sudo tcpdump -i lo port 5984 -A -s3000
05:48:49.278717 IP localhost.53793 > localhost.5984: P 1:150(149) ack 1 win 513
.[.:.[.:GET /eee/2001-09-02-potatoes HTTP/1.1
If-None-Match: 2-2471836896
Accept: application/xml
Accept-Encoding: gzip, deflate
Host: localhost:5984

05:48:49.278736 IP localhost.5984 > localhost.53793: . ack 150 win 190
E..4;.@.@.. .........`.!.....y.............
05:48:49.282277 IP localhost.5984 > localhost.53793: P 1:221(220) ack 150 win 192
.[.;.[.:HTTP/1.1 200 OK
Server: CouchDB/0.9.0a756286 (Erlang OTP/R12B)
Etag: "2-2471836896"
Date: Thu, 24 Sep 2009 09:48:49 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 1908
Cache-Control: must-revalidate

05:48:49.282309 IP localhost.53793 > localhost.5984: . ack 221 win 530
05:48:49.282352 IP localhost.5984 > localhost.53793: P 221:2129(1908) ack 150 win 192
.[.;.[.;{"_id":"2001-09-02-potatoes","_rev":"2-2471836896","prep_time":10,"title":"Roasted Potatoes",...
Now c'mon! The request header attribute is being set correctly. It is the same as the ETag and the CouchDB _rev. What am I missing?!

After much head banging, I realize that it is the quotes that I am missing:
>> RestClient.get 'http://localhost:5984/eee/2001-09-02-potatoes', :if_none_match =>  '"2-2471836896"'
RestClient::NotModified: RestClient::NotModified
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:189:in `process_result'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:125:in `transmit'
from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:123:in `transmit'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:49:in `execute_inner'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:39:in `execute'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:17:in `execute'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient.rb:65:in `get'
from (irb):13
from :0
Interesting. I am not sure that this is an exceptional case, but I can certainly catch that exception and signal Rack::Cache to immediately send back its copy.

Just to be sure that I know what is happening I do check the output of tcpdump in this case. Indeed, the quotes are doing the trick:
jaynestown% sudo tcpdump -i lo port 5984 -A -s3000
05:49:10.661038 IP localhost.53802 > localhost.5984: P 1:152(151) ack 1 win 513
.\...\..GET /eee/2001-09-02-potatoes HTTP/1.1
If-None-Match: "2-2471836896"
Accept: application/xml
Accept-Encoding: gzip, deflate
Host: localhost:5984

05:49:10.663062 IP localhost.5984 > localhost.53802: P 1:156(155) ack 152 win 192
.\...\..HTTP/1.1 304 Not Modified
Server: CouchDB/0.9.0a756286 (Erlang OTP/R12B)
Etag: "2-2471836896"
Date: Thu, 24 Sep 2009 09:49:10 GMT
Content-Length: 0
Now that I understand how to make proper RestClient.get calls with a If-None-Match header attribute, I can wrap it in a begin/rescue block in my Sinatra action:
get '/recipes/:permalink' do
data =
RestClient.get "#{@@db}/#{params[:permalink]}",
:if_none_match => request.env["HTTP_IF_NONE_MATCH"]
rescue RestClient::NotModified
etag request.env["HTTP_IF_NONE_MATCH"].gsub(/"/, '')

@recipe = JSON.parse(data)
etag @recipe['_rev']

url = "#{@@db}/_design/recipes/_view/by_date_short"
data = RestClient.get url
@recipes_by_date = JSON.parse(data)['rows']

@url = request.url

haml :recipe
That behaves as I expected, but what does this all mean? To answer that, I break out Apache Bench to measure response times:
# Access the rack app with Rack::Cache and with full stack etag support:
ab -H "If-None-Match: '2-2471836896'" -n 100 http://localhost:9292/recipes/2001-09-02-potatoes
# Access the rack app with Rack::Cache, but without full stack etag support:
ab -n 100 http://localhost:9292/recipes/2001-09-02-potatoes
# Access the Thin server (no Rack::Cache, no etag support):
ab -n 100 http://localhost:4567/recipes/2001-09-02-potatoes
The results:
StackAverage Req./sec
Full stack etag133.59
No etag/no cache10.71
The conclusion that I draw is that I definitely want to use Rack::Cache—100% improvement over reassembling the HTML on each request is too good to pass up. As for the 20% speed boost that full stack ETag buys me, I am not sure that the complexity that is introduced warrants the speed boost. If nothing else, it is worth considering in certain cases.


  1. Mucho respect Chris for such a fascinating detailed account of the building of your app. Keep up the work mate!

  2. Nice post. So by 'full stack' you mean using the ETag to determine if the data is fresh in two places, client->server + server->datastore, correct? And you determined that the bulk of the benefit comes from the client->server caching via etag. The 'additional complexity' of the full-stack etag support is passing the client's If-None-Match along to couchdb, right? Thanks!