Sunday, August 16, 2009

HAProxy

‹prev | My Chain | next›

As pointed out by an astute reader, my current architecture is... limited.

In the two previous incarnations of the EEE Coooks, I have scraped by with either completely static HTML or a single mongrel instance on a shared hosting service. Now that I have an entire Linode all to myself, I have more options available.

Specifically, I can run HAProxy to balance multiple thin instances of my Sinatra application. I do not know that I will ever need that many, but YAGNI rarely applies to infrastructure—especially when infrastructure is cheap. And cheap it is, getting four thin instances requires a single change to the YAML configuration file:
# sudo vi /etc/thin/eee.yml
---
user: www-data
group: www-data
pid: log/thin.pid
log: log/thin.log
timeout: 60
port: 8000
address: 127.0.0.1
servers: 4
chdir: /var/www/eee-code
environment: production
daemonize: true
rackup: config.ru
I could configure the nginx server that I am using to round-robin through each of the thin instances. The trouble with vanilla round-robin proxying is that, should one instance get bogged down with a long running process, subsequent requests will eventually round-robin back to the "stuck" instance—and hang.

There is the fair proxy patch for nginx, but I seem to recall some instability with it. HAProxy has always been rock solid, so I prefer it for load balancing. It does suffer from dense documentation, though there is more and more documentation popping up.

Soo...
sh-3.2$ sudo apt-get install haproxy
Annoyingly, HAProxy on Debian will not start via init.d script without the following change to /etc/default/haproxy:
# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"
I can accept the default value of ENABLED=0 if there is some warning when the init.d script is run, but it just silently fails. Ah well, such is life with such pure awesomeness of a tool whose only documentation is an 80 column text file.

With it enabled, I can get HAProxy to proxy my thin servers with the following config file (based on the config from the Rails Machine wiki):
global
# maximum number of simultaneous active connections from an upstream web server
maxconn 500

# Logging to syslog facility local0
# log 127.0.0.1 local0

# Distribute the health checks with a bit of randomness
spread-checks 5

# Uncomment the statement below to turn on verbose logging
#debug

# Settings in the defaults section apply to all services (unless you change it,
# this configuration defines one service, called rails).
defaults

# apply log settings from the global section above to services
log global

# Proxy incoming traffic as HTTP requests
mode http

# Distribute incoming requests between Mongrels by round robin algorythm.
# Note that because of 'maxconn 1' settings in the listen section, Mongrels
# that are busy processing some other request will actually be skipped.
# So, the actual load-balancing behavior is smarter than simple round robin.
balance roundrobin

# Maximum number of simultaneous active connections from an upstream web server
# per service
maxconn 500

# Log details about HTTP requests
option httplog

# Abort request if client closes its output channel while waiting for the
# request. HAProxy documentation has a long explanation for this option.
option abortonclose

# Check if a "Connection: close" header is already set in each direction,
# and will add one if missing.
option httpclose

# If sending a request to one Mongrel fails, try to send it to another, 3 times
# before aborting the request
retries 3

# Do not enforce session affinity (i.e., an HTTP session can be served by
# any Mongrel, not just the one that started the session
option redispatch

# Timeout a request if the client did not read any data for 120 seconds
timeout client 120000

# Timeout a request if Mongrel does not accept a connection for 120 seconds
timeout connect 120000

# Timeout a request if Mongrel does not accept the data on the connection,
# or does not send a response back in 120 seconds
timeout server 120000

# Remove the server from the farm gracefully if the health check URI returns
# a 404. This is useful for rolling restarts.
http-check disable-on-404

# Enable the statistics page
stats enable
stats uri /haproxy?stats
stats realm Haproxy\ Statistics
stats auth haproxy:stats

# Create a monitorable URI which returns a 200 if haproxy is up
monitor-uri /haproxy?monitor

# Specify the HTTP method and URI to check to ensure the server is alive.
# see http://github.com/jnewland/pulse
#option httpchk GET /pulse

# Amount of time after which a health check is considered to have timed out
timeout check 2000

# Thin service section.
listen eee :80
server eee-1 localhost:8000 maxconn 1 check inter 20000 fastinter 1000 fall 1
server eee-2 localhost:8001 maxconn 1 check inter 20000 fastinter 1000 fall 1
server eee-3 localhost:8002 maxconn 1 check inter 20000 fastinter 1000 fall 1
server eee-4 localhost:8003 maxconn 1 check inter 20000 fastinter 1000 fall 1
Nice! For the astute, I have left the status page available for the time being (this is a beta site). Tomorrow, I will set up some monitoring. For real this time.

No comments:

Post a Comment