How to Monitor Java Applications on EC2 with Cacti

Paul R. Brown @ 2009-05-19T05:53:21Z

As part of a scale-out effort for a customer moving from a single node hosted on Slicehost to a multi-node environment hosted in the US and EU on Amazon EC2, I wanted a way to introduce a combination of application and host-level monitoring for the nodes. I settled on the combination RRDTool graphs served by Cacti and an alive check provided by a third party (Monitis), but there was no immediately obvious way to bridge the gap between the Java services and the Cacti convenience wrapper around RRDTool.

This was before the recent announcement by Amazon of monitoring functionality for EC2 nodes, but that service wouldn't meet the primary use case of application versus host monitoring. A tool like JConsole didn't make sense because I was interested in getting a single portal view across the fleet and in having retrospective data to make visual day-to-day or week-to-week comparisons.

This post describes how to bring the pieces together, and the technique is equally applicable to non-Java systems — any system that can serve HTTP requests can be instrumented. In the end, about a day's worth of experimentation and work was enough to get me the level of instrumentation I was after.

Host Configuration Requirements

Each of the nodes in the fleet runs on a slightly modified CentOS 5.2 AMI (based on one (ami-1363877a) provided by Rightscale), and getting basic host information exposed over SNMP is straightforward:

$ yum install net-snmp
[... lots of output ...]
$ mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf-old
$ echo 'rocommunity public' > /etc/snmp/snmpd.conf
$ /etc/init.d/snmpd restart

The underlying assumption, of course, is that the instance was launched under a security group that exposes UDP ports 161 and 162 to the host that will be running Cacti. This can all be made to work without assigning elastic IP numbers to the nodes and to the Cacti host, but it's easier.

For the Cacti host, more or less any modern Linux distribution (e.g., Ubuntu or CentOS) will do, and I'd recommend following Eric Hammond's very nice tutorial about setting up MySQL on an EBS volume before doing the Cacti install. For the same reason it makes sense to have MySQL on the attached EBS volume (survive instance termination, support backups, etc.), it makes sense to store RRDTool's backing data there as well.

Instrumentation and Collection

The Java application in question (SmartFox) has no explicit support for exporting metrics and no MBeans exposed for access via JMX, but it does provide some API-level support for basic information and an embedded servlet container (Jetty, of course). (SmartFox does bundle a Flash-based administrative tool, but like JConsole it's single-node and does not provide much beyond in the way of retrospective data.)

After some poking around (i.e., reading PHP source code) in Cacti, I found that Cacti's standard "Script/Command" data input method consumes data as space-separated name/value pairs on a single line:

name1:value1 name2:value2 ...

So I put together a simple servlet to grab the server singleton object from the SmartFox API and print metrics out on a text/plain response. This could just as easily be done with an MBean instance looked up via the JVM's default JMX infrastructure or a metric facade injected into the servlet as part of the overall web application — the point is that the single line of name/value pairs is the required interface to Cacti.

The data is then accessed via a curl invocation templated for variables:

curl http://<host>:<port>/<webapp>/sfs-status?zone=<zone>

The fields in angle brackets are input fields that will be filled-in by other objects in Cacti, and the output fields for the data input method should be named to match the names in the name/value pairs from above.

The downside of this approach is that there is quite a bit of configuration that goes on top of this one-liner (graphs instantiate graph templates and pull from data sources that reference data templates that in turn reference data input methods, or something like that), but it more or less just works. (Even at that, it is less painful and more forgiving than some other tools I've worked with, e.g., ZenOSS.) A couple of hours of experimentation should be enough to get a decent set of basic graphs customized for the application at hand.

[an RRD graph]

As mentioned above, it goes without saying that the EC2 security groups for the instances need to be set up so that this data is not generally accessible but can still be seen by the Cacti host.

Tips and Tricks

The only real issues that I encountered in the process were some disconnects between what Cacti allows you to enter and what RRDTool accepts as input. Once you're done with the necessary setup or some tweaks, if your graphs either don't appear or disappear, there's a good chance that RRDTool doesn't like what Cacti is asking it to do. In that case, turn on the "graph debug" option to see what Cacti is sending to RRDTool and adjust your configuraiton accordingly.

(comment bubbles) 1 comment

Product Management for the Busy Entrepreneur

Paul R. Brown @ 2009-03-16T18:59:23Z

I was talking with a budding entrepreneur in the open source "big data" space (with my Entreprementor hat on), and we talked about his sales pipeline and potential customers. He had a list of a couple dozen companies that had expressed some interest, and that much was great. Just the same, I asked about the elephant in the room: Interest in what? Interest in your wonderful open source doohickey might get you Internet Famous like the Star Wars Kid but isn't likely to pay your bills let alone be something to build a company on.

This brings me to the subject of product management for the busy entrepreneur.

Product Versus Business

It's easy to get confused about the difference between a product and a business. A business is a machine that turns something you have into money. (One that produces more money than it consumes is a good business...) A product is a describable, sellable thing that your business can produce over and over and customers can consume over and over without too many changes. Products can be broad, like professional services, or very specific, like machine parts, but the aspects of commonality in description and delivery need to be there.

The point of a product is that the commonalities enable scale in a business's internal processes, from production to sales to accounting. It is also that commonality that makes a business an investment prospect because you can make reasonable inferences about capital in versus capital out (ergo value). Defining a product involves a bit of intuition and guesswork, but refining that definition is simple: Ask potential customers if they would spend money on it. I've never been able to understand the relative reluctance of some entrepreneurs to get on the phone or hit the street with an idea, maybe out of a reluctance to have their idea trashed by reality, but the customer's money is the one source of Truth. If it's difficult to explain, if it's not something that the customer's business can readily consume, or if the customer doesn't "get" it, then it needs to change.

Rows and Columns

Once you have a panel of potential customers assembled, it's time to sit down with a spreadsheet and figure things out. Customers go down column "A", and potential products go across row 1. Put an "x" and maybe a note in a cell if that customer would pay for that product, and then look for the column with the most x's. Alternatively, you can use the price that the customer would pay as the value for the cell in the column and try to make a more refined decision based on the profitability of the offerings, but the idea is the same — Get real data on what customers want.

There's an aspect to survey design that's important when interviewing a potential customer. To get a real response, ask specific questions and set the expectation with that potential customer that you'll very likely be back to get a check from them. You should expect to iterate on the process a few times, as customers may help you add columns to the spreadsheet, but you should avoid open-ended questions.

Real product management is quite a bit more involved and detailed but equally necessary as your product and base of customers grows and evolves. Nonetheless, this should be enough to get started.

Basic product management is one of those times where stating the obvious is useful: Data is helpful in making decisions.

(comment bubbles) 3 comments

OK... Who put the JUnit JAR in my WAR?

Paul Brown @ 2007-03-07T02:26:15Z

My odyssey with Maven continues. This entry is spurred by having a WAR built with Maven come out to be three times the size of the one built with the original Ant build. JUnit, JMock, a couple of different Log4J's, and other assorted goodies. With multiple modules and liberal use of open source components, the question is not whether someone did but who peed in which POM?

Open source reminds me of college. I had the opportunity to enjoy some eclectic people during my education at Reed and Berkeley. Rent a room and then sublet the closet? That's cool. Eat what others would otherwise throw away in the dining commons? That's cool. (Off topic, at least one former "scrounger" has done just fine...) These sort of situations came with their own etiquette, e.g., tell a "scrounger" if you have a cold when you drop off your tray and leave items intact and relatively unmolested. The bohemian analogy cuts both ways with open source — you can probably find whatever you are looking for, but it may not be in quite the state that you'd like.

Some shell scripting (find, grep, xargs, and friends) identified commons-httpclient as the likely culprit, and sure enough, it's there plain as day:

<dependency>
  <groupId>junit</groupId>
  <artifactId>junit</artifactId>
  <version>3.8.1</version>
</dependency>

There should be a <scope>test</scope>, but there isn't. Since he helps steward the commons, I pinged Henri about it, and it looks like the issue was already fixed for versions 3.1 and on. This was only part of the battle, however, because commons-httpclient wasn't an explicit dependency; it was only getting included as a transitive dependency of some other dependency of one of the modules that the web application used, and the module hierarchy was already four levels deep. Surely someone else has already experienced issues with dependencies of unknown provenance and come up with a way to navigate the graph, and it turns out that there are (at least) two solutions.

First up, for playing Heracles to my Odysseus or Anchises to my Aeneas or Virgil to my Dante or Laurel to my Hardy or whatever in JAR hell, Henri gets a hat-tip for pointing me at the pomtools plugin, which provides an interactive interface for navigating the graph of dependencies and can alter and serialize the underlying model of the project to fix version conflicts. I didn't end up trying it, but I will, since I have a soft spot for anything with a terminal interface.

Instead, since I also have a soft spot for GraphViz, I used the depgraph plugin from the EL4J project, which I found via Philipp Oser's blog. In my case, the plugin produced the following graph:

The graph showed commons-httpclient referenced by a variety of XFire components, and some exclusions got me out of JAR purgatory for the moment. (I ate a couple of whole pomegranates down there, so I'm sure I'll be headed back sometime soon...) This isn't just a Jakarta Commons issue. XFire has a little of the same kind of POM-rot as of 1.2.3, but that will disappear in the forthcoming 1.2.5. For those keeping score at home, AXIS2 has some (xmlunit should be <scope>'d to test), too. This makes me wish for a Maven "lint" that would flag common errors like test libraries listed as runtime dependencies or dependencies not referenced from runtime source code.

Getting the depgraph plugin wired-up was straightforward. I just added a plugin repository to the master POM:

<pluginRepositories>
  <pluginRepository>
    <id>elca-services</id>
    <url>http://el4.elca-services.ch/el4j/maven2repository</url>
    <releases>
      <enabled>true</enabled>
    </releases>
  </pluginRepository>
</pluginRepositories>

Then the plugin to the build:

<build>
[...]
  <plugins>
    [...]
    <plugin>
      <groupId>ch.elca.el4j.maven.plugins</groupId>
      <artifactId>maven-depgraph-plugin</artifactId>
      <configuration>
        <outDir>target/site/images</outDir>
        <outFile>${pom.artifactId}.png</outFile>
      </configuration>
    </plugin>
  </plugins>
</build>

And then it's just a mvn depgraph:depgraph to get a view of the dependency graph. The real lesson here is to aggressively scope your dependencies as a service to the community.

(comment bubbles) 5 comments

Then Do Something Different

Paul Brown @ 2006-11-07T03:05:00Z

Bill Grosso pointed me at Mark Himelstein's book "100 Questions to Ask Your Software Organization", and I've been surprised not to find more mentions of the book or Mark's blog out in the blogosphere. (That said, he as an accomplished VP of Engineering, not Marketing.) The book is a little rough around the edges, but it's a very realistic picture of what it means to lead a software organization. It's a good exercise to take a question a day and honestly assess how you've dealt with it. Especially in a context where you don't have more experienced leaders to learn from, forced introspection is valuable.

A recent blog entry from Mark reminded me of one of the most difficult things about being a leader, which is being wrong:

When an organization is facing challenges around meeting their commitments, hiring goals, retention goals, and quality goals it is often suggested that teams improve their processes. While I have seldom seen teams succeed without processes, I have seen numerous teams fail with them. In my mind they are required but they are not sufficient. In the end, competent and courageous leadership will win the day.

I've seen the inappropriate emphasis on "how" instead of "what" in companies large and small. If what you're doing isn't working, then you need to do something different. In metaphorical terms, it's not a question of your velocity, attitude, hairstyle, or the angle of incidence of your head with the brick wall — recognize and acknowledge that it's a brick wall and find a way over or around rather than try to go through and dragging your organization along with you. (Or, for that matter, find a way to use the brick wall in your favor; a barrier to execution can turn into a advantage if you're able to use it against your competitors.) The right amount of process is the least amount of rigidity that ensures that good information flows up to the leadership and that the effectiveness of changes in direction can be quickly and objectively assessed.

As to the right kinds and levels of processes, two key challenges come to mind:

  • Select the right metrics for the goals. Hard, quantitative measurements recorded regularly are a must, but it takes discipline to ensure that those measurements have a causal relationship with the goals at hand. Is reducing bug counts going to enhance engineer morale? Is marketing effectiveness going to enhance sales effectiveness, or is the pipeline kinked-up elsewhere?
  • Ensure that locality of action and locality of information coincide. A well-factored organization has the same smell as a well-factored API, and no process should involve information and decisions moving across more than one tier at a time. Should an executive care what flavor of XP a particular team is using? Absolutely not, other than to ensure that the experiences of that lead/manager are disseminated to others. Should an executive carefully review deliverable definitions, ruthlessly track progress, and drive consensus between the teams that collaborate on a product? Absolutely. Effective processes preserve the autonomy and cleverness of people at different levels in the organization, and if your organization lacks that kind of trust, then gut it — you have the wrong people.
(comment bubbles) 1 comment