This is how I have am using Rack::Cache, Sinatra, and CouchDB:
1. Web client ^The nice thing about this stack is that it is all web-based, which will allow me to make certain assumptions when optimizing.
Request | 5. Respond to client
| | and
| | Store in
+---+-------------------+-+ /-----> file system
| | Rack::Cache +-+--- cache
+---+-------------------+-+
| |
+---v-------------------+-+
| |
| Sinatra |
| +----------+
+----+ | |
| +-------------------------+ |
| ^ \
| 2. RestClient | 3. Response |
| Request | _rev: 1234 |
| | |
| +----------------+--------+ \
+--->| | |
| CouchDB |<-----------+
| | 4. Ancillary
| | Requests
+-------------------------+
Yesterday, I was able to by-pass step #4 in that diagram which should cut down significantly on the total request time. I used the
_rev
(revision) attribute returned from CouchDB in step #3 as the argument to Sinatra's etag
method. Rack::Cache
, in turn, uses that value to decide whether it can use a previously stored cached copy of the HTML generated by Sinatra from the assembled bits of several CouchDB requests. The action in question:
get '/recipes/:permalink' doIf the
data = RestClient.get "#{@@db}/#{params[:permalink]}"
@recipe = JSON.parse(data)
etag @recipe['_rev']
url = "#{@@db}/_design/recipes/_view/by_date_short"
data = RestClient.get url
@recipes_by_date = JSON.parse(data)['rows']
@url = request.url
haml :recipe
end
_rev
of the CouchDB recipe document matches the ETag
of the HTML document stored in cached, all other processing stops and the cached copy is immediately returned. If they do not match, a new document is generated.As of yesterday, with the code above, I have that working. What I would like to accomplish today is skipping step #2. In the case that the web browser already has a cached copy of the web page (and hence knows the HTML document's
ETag
), why bother requesting the entire document from CouchDB? As long as the ETag
and CouchDB _rev
match, the request life cycle should stay very close to the top of that diagram.In order to make that happen, I need the
RestClient
call at the start of the Sinatra action to supply the If-None-Match
HTTP request header attribute that corresponds to the ETag
response header attribute.RestClient
supports request attributes via optional second argument to the get
method. To tell CouchDB to only return a recipe if has been updated, I can use this form:>> RestClient.get 'http://localhost:5984/eee/2001-09-02-potatoes', :if_none_match => "2-2471836896"Hmmm... Well, at least I think I should be able to use that form. I am not sure what is going wrong there, but the entire document is being returned. After other troubleshooting fails, I drop down to packet sniffing with
=> "{"_id":"2001-09-02-potatoes",
"_rev":"2-2471836896",
"prep_time":10,
"title":"Roasted Potatoes"
...
}
tcpdump
:jaynestown% sudo tcpdump -i lo port 5984 -A -s3000Now c'mon! The request header attribute is being set correctly. It is the same as the
05:48:49.278717 IP localhost.53793 > localhost.5984: P 1:150(149) ack 1 win 513
E...L5@.@............!.`.y.
...............
.[.:.[.:GET /eee/2001-09-02-potatoes HTTP/1.1
If-None-Match: 2-2471836896
Accept: application/xml
Accept-Encoding: gzip, deflate
Host: localhost:5984
05:48:49.278736 IP localhost.5984 > localhost.53793: . ack 150 win 190
E..4;.@.@.. .........`.!.....y.............
.[.:.[.:
05:48:49.282277 IP localhost.5984 > localhost.53793: P 1:221(220) ack 150 win 192
E...;.@.@..,.........`.!.....y.............
.[.;.[.:HTTP/1.1 200 OK
Server: CouchDB/0.9.0a756286 (Erlang OTP/R12B)
Etag: "2-2471836896"
Date: Thu, 24 Sep 2009 09:48:49 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 1908
Cache-Control: must-revalidate
05:48:49.282309 IP localhost.53793 > localhost.5984: . ack 221 win 530
E..4L6@.@............!.`.y....._.....b.....
.[.;.[.;
05:48:49.282352 IP localhost.5984 > localhost.53793: P 221:2129(1908) ack 150 win 192
E...;.@.@............`.!..._.y.............
.[.;.[.;{"_id":"2001-09-02-potatoes","_rev":"2-2471836896","prep_time":10,"title":"Roasted Potatoes",...
ETag
and the CouchDB _rev
. What am I missing?!After much head banging, I realize that it is the quotes that I am missing:
>> RestClient.get 'http://localhost:5984/eee/2001-09-02-potatoes', :if_none_match => '"2-2471836896"'Interesting. I am not sure that this is an exceptional case, but I can certainly catch that exception and signal
RestClient::NotModified: RestClient::NotModified
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:189:in `process_result'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:125:in `transmit'
from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:123:in `transmit'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:49:in `execute_inner'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:39:in `execute'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient/request.rb:17:in `execute'
from /usr/lib/ruby/gems/1.8/gems/rest-client-1.0.3/bin/../lib/restclient.rb:65:in `get'
from (irb):13
from :0
Rack::Cache
to immediately send back its copy.Just to be sure that I know what is happening I do check the output of
tcpdump
in this case. Indeed, the quotes are doing the trick:jaynestown% sudo tcpdump -i lo port 5984 -A -s3000Now that I understand how to make proper
05:49:10.661038 IP localhost.53802 > localhost.5984: P 1:152(151) ack 1 win 513
E.....@.@..m.........*.`.e....6............
.\...\..GET /eee/2001-09-02-potatoes HTTP/1.1
If-None-Match: "2-2471836896"
Accept: application/xml
Accept-Encoding: gzip, deflate
Host: localhost:5984
05:49:10.663062 IP localhost.5984 > localhost.53802: P 1:156(155) ack 152 win 192
E...5.@.@............`.*..6..e.?...........
.\...\..HTTP/1.1 304 Not Modified
Server: CouchDB/0.9.0a756286 (Erlang OTP/R12B)
Etag: "2-2471836896"
Date: Thu, 24 Sep 2009 09:49:10 GMT
Content-Length: 0
RestClient.get
calls with a If-None-Match
header attribute, I can wrap it in a begin
/rescue
block in my Sinatra action:get '/recipes/:permalink' doThat behaves as I expected, but what does this all mean? To answer that, I break out Apache Bench to measure response times:
data =
begin
RestClient.get "#{@@db}/#{params[:permalink]}",
:if_none_match => request.env["HTTP_IF_NONE_MATCH"]
rescue RestClient::NotModified
etag request.env["HTTP_IF_NONE_MATCH"].gsub(/"/, '')
end
@recipe = JSON.parse(data)
etag @recipe['_rev']
url = "#{@@db}/_design/recipes/_view/by_date_short"
data = RestClient.get url
@recipes_by_date = JSON.parse(data)['rows']
@url = request.url
haml :recipe
end
# Access the rack app with Rack::Cache and with full stack etag support:The results:
ab -H "If-None-Match: '2-2471836896'" -n 100 http://localhost:9292/recipes/2001-09-02-potatoes
# Access the rack app with Rack::Cache, but without full stack etag support:
ab -n 100 http://localhost:9292/recipes/2001-09-02-potatoes
# Access the Thin server (no Rack::Cache, no etag support):
ab -n 100 http://localhost:4567/recipes/2001-09-02-potatoes
Stack | Average Req./sec |
---|---|
Full stack etag | 133.59 |
Rack::Cache | 106.01 |
No etag/no cache | 10.71 |
Rack::Cache
—100% improvement over reassembling the HTML on each request is too good to pass up. As for the 20% speed boost that full stack ETag
buys me, I am not sure that the complexity that is introduced warrants the speed boost. If nothing else, it is worth considering in certain cases.
Mucho respect Chris for such a fascinating detailed account of the building of your app. Keep up the work mate!
ReplyDeleteNice post. So by 'full stack' you mean using the ETag to determine if the data is fresh in two places, client->server + server->datastore, correct? And you determined that the bulk of the benefit comes from the client->server caching via etag. The 'additional complexity' of the full-stack etag support is passing the client's If-None-Match along to couchdb, right? Thanks!
ReplyDelete