As part of a scale-out effort for a customer moving from a single
node hosted on Slicehost to a
multi-node environment hosted in the US and EU on Amazon EC2, I wanted a way to
introduce a combination of application and host-level monitoring for
the nodes. I settled on the combination RRDTool graphs served by Cacti and an alive check provided by
a third party (Monitis), but
there was no immediately obvious way to bridge the gap between the
Java services and the Cacti convenience wrapper around RRDTool.
This was before the recent announcement
by Amazon of monitoring
functionality for EC2 nodes, but that service wouldn't meet the
primary use case of application versus host monitoring. A tool like
JConsole didn't make sense because I was interested in getting a
single portal view across the fleet and in having retrospective data
to make visual day-to-day or week-to-week comparisons.
This post describes how to bring the pieces together, and the
technique is equally applicable to non-Java systems — any system
that can serve HTTP requests can be instrumented. In the end, about a
day's worth of experimentation and work was enough to get me the level
of instrumentation I was after.
Host Configuration Requirements
Each of the nodes in the fleet runs on a slightly modified CentOS
5.2 AMI (based on one
(ami-1363877a) provided by Rightscale), and getting basic host
information exposed over SNMP is straightforward:
$ yum install net-snmp
[... lots of output ...]
$ mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf-old
$ echo 'rocommunity public' > /etc/snmp/snmpd.conf
$ /etc/init.d/snmpd restart
The underlying assumption, of course, is that the instance was
launched under a security group that exposes UDP ports 161 and 162 to
the host that will be running Cacti. This can all be made to work
without assigning elastic IP numbers to the nodes and to the Cacti
host, but it's easier.
For the Cacti host, more or less any modern Linux distribution
(e.g., Ubuntu or CentOS) will do, and I'd recommend following Eric Hammond's very nice tutorial
about setting up MySQL on an EBS volume before doing the Cacti
install. For the same reason it makes sense to have MySQL on the
attached EBS volume (survive instance termination, support backups,
etc.), it makes sense to store RRDTool's backing data there as
well.
Instrumentation and Collection
The Java application in question (SmartFox) has no explicit
support for exporting metrics and no MBeans exposed for access via
JMX, but it does provide some API-level support for basic information
and an embedded servlet container (Jetty, of course). (SmartFox does
bundle a Flash-based administrative tool, but like JConsole it's
single-node and does not provide much beyond in the way of
retrospective data.)
After some poking around (i.e., reading PHP source code) in Cacti,
I found that Cacti's standard "Script/Command" data input method
consumes data as space-separated name/value pairs on a single
line:
name1:value1 name2:value2 ...
So I put together a simple servlet to grab the server singleton
object from the SmartFox API and print metrics out on a
text/plain response. This could just as easily be done
with an MBean instance looked up via the JVM's default JMX
infrastructure or a metric facade injected into the servlet as part of
the overall web application — the point is that the single line
of name/value pairs is the required interface to Cacti.
The data is then accessed via a curl invocation
templated for variables:
curl http://<host>:<port>/<webapp>/sfs-status?zone=<zone>
The fields in angle brackets are input fields that will be
filled-in by other objects in Cacti, and the output fields for the
data input method should be named to match the names in the name/value
pairs from above.
The downside of this approach is that there is quite a bit of
configuration that goes on top of this one-liner (graphs instantiate
graph templates and pull from data sources that reference data
templates that in turn reference data input methods, or something like
that), but it more or less just works. (Even at that, it is less
painful and more forgiving than some other tools I've worked with,
e.g., ZenOSS.) A couple of hours of experimentation should be enough
to get a decent set of basic graphs customized for the application at
hand.
![[an RRD graph]](http://mult.ifario.us/files/rrd_graph.png)
As mentioned above, it goes without saying that the EC2 security groups for
the instances need to be set up so that this data is not generally
accessible but can still be seen by the Cacti host.
Tips and Tricks
The only real issues that I encountered in the process were some
disconnects between what Cacti allows you to enter and what RRDTool
accepts as input. Once you're done with the necessary setup or some
tweaks, if your graphs either don't appear or disappear, there's a
good chance that RRDTool doesn't like what Cacti is asking it to do.
In that case, turn on the "graph debug" option to see what Cacti is
sending to RRDTool and adjust your configuraiton accordingly.