Monday, August 17, 2009

God

‹prev | My Chain | next›

I have things on the beta site in something of a final state. I have been meaning to keep a closer eye on things, so:
sh-3.2$ sudo gem install god
God is a ruby-based monitoring tool. I do more than enough monit stuff at the day job—there is no way that I am going to do it on the side as well. Besides, god is kinda fun.

I start with this configuration for my thin server:
DAEMON = '/var/lib/gems/1.8/bin/thin'
CONFIG_PATH = '/etc/thin'
APP_ROOT = '/var/www/eee-code'

%w{8000 8001 8002 8003}.each do |port|
server_number = port.to_i - 7999
God.watch do |w|
w.name = "eee-thin-#{port}"
w.start = "#{DAEMON} start --all #{CONFIG_PATH} --only #{server_number}"
w.stop = "#{DAEMON} stop --all #{CONFIG_PATH} --only #{server_number}"
w.restart = "#{DAEMON} restart --all #{CONFIG_PATH} --only #{server_number}"
w.interval = 30.seconds # default
w.start_grace = 10.seconds
w.restart_grace = 10.seconds
w.pid_file = File.join(APP_ROOT, "log/thin.#{port}.pid")

w.behavior(:clean_pid_file)

end
end
The nice thing about god is the readability of the configuration file—most of it is self-explanatory. So far, I have a name (for reporting purposes), start/stop/restart commands, interval between checks, and grace periods after start/restart before monitoring kicks back in.

The only out-of-the-ordinary thing in there is the server_number, which is thin-specific. The servers are indexed from 1. The first thin server in my cluster is located at port 8000 (then 8001, 8002, and 8003). Thus the server number can be calculated by subtracting 7999 from the port number.

Try that with monit.

The next few bits are from the sample configuration file provided along with the god gem. I add a check to start the server if the server has been not running for 5 seconds:
    w.start_if do |start|
start.condition(:process_running) do |c|
c.interval = 5.seconds
c.running = false
end
end
I add a check to restart the server if the memory or cpu usage gets too high:
    w.restart_if do |restart|
restart.condition(:memory_usage) do |c|
c.above = 30.megabytes
c.times = [3, 5] # 3 out of 5 intervals
end

restart.condition(:cpu_usage) do |c|
c.above = 50.percent
c.times = 5
end
end
And finally, a check to handle a flailing system (to force god to stop paying attention):
    w.lifecycle do |on|
on.condition(:flapping) do |c|
c.to_state = [:start, :restart]
c.times = 5
c.within = 5.minute
c.transition = :unmonitored
c.retry_in = 10.minutes
c.retry_times = 5
c.retry_within = 2.hours
end
end
Last, but certainly not least, I would like to be notified when the server enters one of these states. Since I am running the lightweight sendmail replacemant, ssmtp, I need to employ god's sendmail configuration:
God::Contacts::Email.message_settings = {
:from => 'god@___cooks.com'
}

God::Contacts::Email.delivery_method = :sendmail

God::Contacts::Email.sendmail_settings = {
:location => '/usr/sbin/sendmail',
:arguments => '-i -t'
}

God.contact(:email) do |c|
c.name = 'chris'
c.email = 'chris@___cooks.com'
end
And, to get notified when a start takes place, I add a notify directive to the start condition:
    w.start_if do |start|
start.condition(:process_running) do |c|
c.interval = 5.seconds
c.running = false
c.notify = 'chris'
end
end
To check my monitoring out, I intentionally kill the thin server running on port 8001 and watch the god logs:
...
I [2009-08-18 02:16:46] INFO: eee-thin-8001 sent email to chris@___cooks.com (Email)
I [2009-08-18 02:16:46] INFO: eee-thin-8001 move 'up' to 'start'
I [2009-08-18 02:16:46] INFO: eee-thin-8001 before_start: no pid file to delete (CleanPidFile)
I [2009-08-18 02:16:46] INFO: eee-thin-8001 start: /var/lib/gems/1.8/bin/thin start --all /etc/thin --only 2
I [2009-08-18 02:16:46] INFO: eee-thin-8002 [ok] process is running (ProcessRunning)
...
I [2009-08-18 02:16:57] INFO: eee-thin-8001 moved 'up' to 'up'
I [2009-08-18 02:16:57] INFO: eee-thin-8001 [ok] process is running (ProcessRunning)
I [2009-08-18 02:16:57] INFO: eee-thin-8001 [ok] memory within bounds [21512kb] (MemoryUsage)
I [2009-08-18 02:16:57] INFO: eee-thin-8001 [ok] cpu within bounds [4.52029520294135%] (CpuUsage)
Nice!

I will get god working under init.d tomorrow and configure it to monitor HAProxy and CouchDB as well.

No comments:

Post a Comment