Wednesday, September 23, 2009

Rack::Cache

‹prev | My Chain | next›

The last time I benchmarked the new site, it was running a little slow. Nothing too serious, but I think I can do better.

Currently, with each request of a recipe or a meal through my Sinatra application:
127.0.0.1 - - [23/Sep/2009 21:24:09] "GET /recipes/2001-09-02-potatoes HTTP/1.1" 200 6936 0.6164
There are multiple CouchDB requests:
[info] [<0.3275.10>] 127.0.0.1 - - 'GET' /eee/2001-09-02-potatoes 200
[info] [<0.5733.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/by_date_short 200
[info] [<0.5734.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/updated_by?key=%222001-09-02-potatoes%22 200
[info] [<0.5737.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/update_of?key=%222001-09-02-potatoes%22 200
[info] [<0.5738.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/alternatives?key=%222001-09-02-potatoes%22 200
[info] [<0.5739.10>] 127.0.0.1 - - 'GET' /eee/2001-09-02-potatoes/roasted_potatoes.jpg 200
[info] [<0.5740.10>] 127.0.0.1 - - 'GET' /eee/2001-09-02-potatoes/roasted_potatoes.jpg 200
There is the initial request to for the meal/recipe data itself, plus ancillary requests for next/previous records, records referenced within the data, etc.

I can avoid all of those requests if I cache the output after fulfilling the request the first time. Since I am using Rack, I can implement caching with Rack::Cache.

Rack::Cache uses ETag headers (among other things) to decide when to expire the cache. Since I am using CouchDB, I can use document revisions for the ETag value. In fact, this is what CouchDB itself does:
jaynestown% curl http://localhost:5984/eee/2001-09-02-potatoes -i
HTTP/1.1 200 OK
Server: CouchDB/0.9.0a756286 (Erlang OTP/R12B)
Etag: "1-1030813362"
Date: Thu, 24 Sep 2009 00:30:22 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 1923
Cache-Control: must-revalidate

{"_id":"2001-09-02-potatoes",
"_rev":"1-1030813362",
"prep_time":10,
"title":"Roasted Potatoes",
"published":true,
...}
To set an ETag, Sinatra provides a handy etag method. By default, it take a single argument specify the value to be used for the ETag:
get '/recipes/:permalink' do
data = RestClient.get "#{@@db}/#{params[:permalink]}"
@recipe = JSON.parse(data)
etag @recipe['_rev']

url = "#{@@db}/_design/recipes/_view/by_date_short"
data = RestClient.get url
@recipes_by_date = JSON.parse(data)['rows']

@url = request.url

haml :recipe
end
By placing the etag immediately after the recipe document is retrieved, I prevent execution of the remainder of the code in the action when the cache is still valid.

That gets the desired header attribute set:
jaynestown% curl http://localhost:4567/recipes/2001-09-02-potatoes -i
HTTP/1.1 200 OK
ETag: "1-1030813362"
Content-Type: text/html;charset=UTF-8
Content-Length: 6936
Connection: keep-alive
Server: thin 1.2.2 codename I Find Your Lack of Sauce Disturbing

<html>
<title>EEE Cooks</title>
<link href='/stylesheets/style.css' rel='stylesheet' type='text/css' />
</html>
...
To make use of that ETag attribute for server-side caching, I need Rack::Cache installed:
jaynestown% gem install rack-cache
WARNING: Installing to ~/.gem since /usr/lib/ruby/gems/1.8 and
/usr/bin aren't both writable.
Successfully installed rack-cache-0.5
1 gem installed
To use it, I add the appropriate use call to my rackup file:
require 'eee.rb'
require 'rubygems'
require 'sinatra'
require 'rack/cache'

use Rack::Cache,
:verbose => true,
:metastore => 'file:/tmp/cache/rack/meta',
:entitystore => 'file:/tmp/cache/rack/body'


root_dir = File.dirname(__FILE__)

set :environment, :development
set :root, root_dir
set :app_root, root_dir
set :app_file, File.join(root_dir, 'eee.rb')
disable :run

run Sinatra::Application
Since I am just trying this out in a spike, I am storing the cached files in /tmp. Were this the real thing, I would use a dedicated, non-volatile filesystem like /var.

Also for spike purposes, I start the application directly (I would use rack-aware Thin in a live situation):
jaynestown% rackup config.ru
When I access the document, I see Rack::Cache headers, so it would seem that it is working:
jaynestown% curl http://localhost:9292/recipes/2001-09-02-potatoes -i
HTTP/1.1 200 OK
Connection: close
Date: Thu, 24 Sep 2009 01:36:28 GMT
ETag: "1-1030813362"
X-Rack-Cache: miss, store
X-Content-Digest: f88a43e32dcfcbc15b7b91d760f761670cdd32eb

Content-Type: text/html;charset=UTF-8
Content-Length: 6936
Age: 0

<html>
<title>EEE Cooks</title>
<link href='/stylesheets/style.css' rel='stylesheet' type='text/css' />
</html>
...
If I access that same resource, Rack::Cache recognizes that that cache is still valid:
jaynestown% curl http://localhost:9292/recipes/2001-09-02-potatoes -i
HTTP/1.1 200 OK
Connection: close
Date: Thu, 24 Sep 2009 02:01:40 GMT
ETag: "1-1030813362"
X-Rack-Cache: stale, valid, store
X-Content-Digest: f88a43e32dcfcbc15b7b91d760f761670cdd32eb
Content-Type: text/html;charset=UTF-8
Content-Length: 6936
Age: 0
Better yet, when the URL is accessed, the main CouchDB document is accessed (to verify that the ETag has not changed), but no ancillary requests are made:
[info] [<0.8491.10>] 127.0.0.1 - - 'GET' /eee/2001-09-02-potatoes 200
To verify that cache expiry is working, I make a small update to the CouchDB document and retry:
jaynestown% curl http://localhost:9292/recipes/2001-09-02-potatoes -i
HTTP/1.1 200 OK
Connection: close
Date: Thu, 24 Sep 2009 02:03:52 GMT
ETag: "2-2471836896"
X-Rack-Cache: stale, invalid, store
X-Content-Digest: 66e55ff8cac5557e7ac73d6ac058a503c7ff5e8f

Content-Type: text/html;charset=UTF-8
Content-Length: 6938
Age: 0
Nice! As expected, the document itself is retrieved from CouchDB, as are the ancillary resources:
[info] [<0.8678.10>] 127.0.0.1 - - 'GET' /eee/2001-09-02-potatoes 200
[info] [<0.8685.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/by_date_short 200
[info] [<0.8686.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/updated_by?key=%222001-09-02-potatoes%22 200
[info] [<0.8689.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/update_of?key=%222001-09-02-potatoes%22 200
[info] [<0.8690.10>] 127.0.0.1 - - 'GET' /eee/_design/recipes/_view/alternatives?key=%222001-09-02-potatoes%22 200
Tomorrow, I will further explore the REST-like nature of CouchDB. As mentioned earlier, CouchDB has ETag support. I should be able to use that support to do conditional retrieval from inside the Sinatra application. I will implement that and then do some benchmarks to see how much of a performance boost I can achieve.

No comments:

Post a Comment