Crunching Java Class Versions with bash-fu

Paul R. Brown @ 2010-08-20T22:59:49Z

I recently needed to root out the JDK 6 classes lurking in an application that was supposed to run on JDK 5, and it turns out that it's not that difficult with a little bash-fu. After unpacking all of the constituent JAR files:

$ find . -name *.class | tee -a classes | xargs -n 1 head -n 1 | \
  cut -b 8 | xargs -IX printf '%d\n' "'X" | \
  paste -d ' ' - classes | grep '^50'

Et, voila! I have the culprit:

50 ./jlayer-1.0.1.jar/javazoom/jl/converter/Converter$PrintWriterProgressListener.class
50 ./jlayer-1.0.1.jar/javazoom/jl/converter/Converter$ProgressListener.class
[...]

A rebuild of the JLayer library, and all's well again.

(comment bubbles) 1 comment

How to Monitor Java Applications on EC2 with Cacti

Paul R. Brown @ 2009-05-19T05:53:21Z

As part of a scale-out effort for a customer moving from a single node hosted on Slicehost to a multi-node environment hosted in the US and EU on Amazon EC2, I wanted a way to introduce a combination of application and host-level monitoring for the nodes. I settled on the combination RRDTool graphs served by Cacti and an alive check provided by a third party (Monitis), but there was no immediately obvious way to bridge the gap between the Java services and the Cacti convenience wrapper around RRDTool.

This was before the recent announcement by Amazon of monitoring functionality for EC2 nodes, but that service wouldn't meet the primary use case of application versus host monitoring. A tool like JConsole didn't make sense because I was interested in getting a single portal view across the fleet and in having retrospective data to make visual day-to-day or week-to-week comparisons.

This post describes how to bring the pieces together, and the technique is equally applicable to non-Java systems — any system that can serve HTTP requests can be instrumented. In the end, about a day's worth of experimentation and work was enough to get me the level of instrumentation I was after.

Host Configuration Requirements

Each of the nodes in the fleet runs on a slightly modified CentOS 5.2 AMI (based on one (ami-1363877a) provided by Rightscale), and getting basic host information exposed over SNMP is straightforward:

$ yum install net-snmp
[... lots of output ...]
$ mv /etc/snmp/snmpd.conf /etc/snmp/snmpd.conf-old
$ echo 'rocommunity public' > /etc/snmp/snmpd.conf
$ /etc/init.d/snmpd restart

The underlying assumption, of course, is that the instance was launched under a security group that exposes UDP ports 161 and 162 to the host that will be running Cacti. This can all be made to work without assigning elastic IP numbers to the nodes and to the Cacti host, but it's easier.

For the Cacti host, more or less any modern Linux distribution (e.g., Ubuntu or CentOS) will do, and I'd recommend following Eric Hammond's very nice tutorial about setting up MySQL on an EBS volume before doing the Cacti install. For the same reason it makes sense to have MySQL on the attached EBS volume (survive instance termination, support backups, etc.), it makes sense to store RRDTool's backing data there as well.

Instrumentation and Collection

The Java application in question (SmartFox) has no explicit support for exporting metrics and no MBeans exposed for access via JMX, but it does provide some API-level support for basic information and an embedded servlet container (Jetty, of course). (SmartFox does bundle a Flash-based administrative tool, but like JConsole it's single-node and does not provide much beyond in the way of retrospective data.)

After some poking around (i.e., reading PHP source code) in Cacti, I found that Cacti's standard "Script/Command" data input method consumes data as space-separated name/value pairs on a single line:

name1:value1 name2:value2 ...

So I put together a simple servlet to grab the server singleton object from the SmartFox API and print metrics out on a text/plain response. This could just as easily be done with an MBean instance looked up via the JVM's default JMX infrastructure or a metric facade injected into the servlet as part of the overall web application — the point is that the single line of name/value pairs is the required interface to Cacti.

The data is then accessed via a curl invocation templated for variables:

curl http://<host>:<port>/<webapp>/sfs-status?zone=<zone>

The fields in angle brackets are input fields that will be filled-in by other objects in Cacti, and the output fields for the data input method should be named to match the names in the name/value pairs from above.

The downside of this approach is that there is quite a bit of configuration that goes on top of this one-liner (graphs instantiate graph templates and pull from data sources that reference data templates that in turn reference data input methods, or something like that), but it more or less just works. (Even at that, it is less painful and more forgiving than some other tools I've worked with, e.g., ZenOSS.) A couple of hours of experimentation should be enough to get a decent set of basic graphs customized for the application at hand.

[an RRD graph]

As mentioned above, it goes without saying that the EC2 security groups for the instances need to be set up so that this data is not generally accessible but can still be seen by the Cacti host.

Tips and Tricks

The only real issues that I encountered in the process were some disconnects between what Cacti allows you to enter and what RRDTool accepts as input. Once you're done with the necessary setup or some tweaks, if your graphs either don't appear or disappear, there's a good chance that RRDTool doesn't like what Cacti is asking it to do. In that case, turn on the "graph debug" option to see what Cacti is sending to RRDTool and adjust your configuraiton accordingly.

(comment bubbles) 1 comment

@Override Idiosyncrasies

Paul R. Brown @ 2008-11-20T07:07:08Z

Just to prove that you frequently learn something new about a language you're familiar with, I recently learned that the following will compile on JDK6:

interface I {
  void m();
}
class C implements I {
  @Override
  public m() {
  }
}

(It even compiles on JDK6 with -source 1.5, which seems like a bug under the circumstances, but no matter.)

I'm sure that this was discussed ad nauseum by the JSR-175 expert group and then subsequently when @Override behavior was overridden for JDK6, but this seems wrong if you ask me. (I'd argue that the JLSv3 concurs.) The @Override annotation, under the JDK5 interpretation, indicates the replacement piece of code in the superclass, along with all of its side-effects. Implementing a method on an interface does not change behavior and requires no more attention from the developer than an understanding of the contract that the interface represents. If anything, an @Implements annotation would have been a better choice than mucking up the @Override annotation to better align with the murky definition of overriding in the JLS.

To make things more confusing, the current JDK6 documentation is incorrect. The JDK6 API documentation says the same thing as the JDK5 documentation does, but JDK7 API documentation describes the JDK6 behavior.

(comment bubbles) 1 comment

The Haskell Platform and Lessons Learned Elsewhere

Paul R. Brown @ 2008-10-02T20:19:43Z

Duncan Coutts posted some slides from ICFP about the developing Haskell platform — a set of "known good" and well-maintained libraries — and it is indeed on its way. (Compare with the "batteries included" effort for OCaml.) Here's the stack from slide 13:

Haskell PlatformLinux Distro
GHCkernel
HackageSourceForge
Cabal.rpm/.deb
cabal-installyum/apt-get

It's not in the charter for the Haskell Platform to make general improvements to Hackage, but looking at the stack diagram I couldn't help but thing about a comparison against a language stack like Java (JDK, Maven, Codehaus, JCP) or Ruby (C ruby, rake, RubyGems, RubyForge) instead of Linux. Quality, collaboration, and liveness are important aspects to assess for a project, but that's next to impossible without publicly accessible (and relatively standardized) bug tracking, source control, and discussion. (Things like continuous integration are niceties but provide somewhat lower utility than the big three I just mentioned.) With 754 libraries, there's a lot of meat and a lot of mystery that's impossible to assess at a glance from the current HackageDB package list web interface. The "heuristic" requirements for libraries in the Haskell platform does include a bug tracker and good basic hygeine (build portably with standard tools, proper releases, maintainer), but it's missing those other touchpoints.

Lest I be perceived as casting stones, my experience is that even in the current circumstances and in spite of the lack of visibility, the Haskell community is vibrant and active. The Cafe and IRC are bustling, and troll-free and flame-free with very rare exceptions. Community leaders are friendly and approachable. Where I've found issues with libraries (resource leak in HTTP, QName equality bug in HXT, attribute qualification issue in "light" xml), maintainers have been helpful and receptive. It would be even better if it was straightforward to collaborate with other users on dedicated mailing lists, post or consume bug fixes via darcs or git branches (the good kind of fork), and track updates. (Use of distributed VCS tools like Darcs and git exponentially increases the utility of a publicly available project VCS.) There is no reason that this should be limited to just a blessed core, as it will help in the growth and refinement of other libraries as well. Posting to Hackage should entail the creation of a mailing list, the provisioning of a bug tracker, and either a check-in of the source for the posted library to a central repository or a reference to a publicly available git or darcs repository.

More than most and less than some, I appreciate the difficulties in providing community infrastructure, having observed Bob and Ben in action at the Haus and watched a top-level project at Apache (ODE) from proposal through graduation. I also appreciate the role of the infrastructure in creating and reinforcing community — it's no accident that a fair proportion of the most widely used open souce libraries on the Java platform have grown either at Apache or at the Codehaus.

(comment bubbles) 2 comments

Beust Sequence Ruminations

Paul R. Brown @ 2008-07-03T06:10:59Z

Cédric posted a nice puzzle on his blog:

[W]rite a counter function that counts from 1 to max but only returns numbers whose digits don't repeat.

Bob tweeted about a minimal solution in terms of execution time, although (like the guy in the cartoon) I still like my Haskell version of the sequence (up to some details with use of Data.Int.Int64 and typing the enumeration):

f k = [ n | n <- [1..k], (length . show $ n) == (length . nub . show $ n) ]

And the function to compute the maximum gap:

drop_tail k = reverse . (drop k) . reverse
by_pairs u = zip (drop 1 u) (drop_tail 1 u)
g k = maximum . map (uncurry (-)) . by_pairs . f $ k

This approach, while visually appealing, is unacceptably slow, even with some of the usual optimizations applied. The fact is that any solution that is implemented in terms of testing all of the numbers between 0 and k will perform orders of magnitude more poorly than the recursive style that Bob's using. But how would I know that...?

Subfactorials, Derangements, and Chooses

A derangement is a permutation with no fixed points, e.g., (123) is a derangement of the set {1,2,3}, but (12) is not because it maps 3 to 3. Like the number of permutations of an n-element set (n!), the number of derangements has its own function called the subfactorial and written !n. MathWorld has a decent write-up, with the major takeaway for present purposes being that !n ~ n!.

The number of elements in Cédric's set, for a fixed radix n is the sum from k=0 up to n of !k (n choose k), and that's much smaller than nn. (This reminds me of the whole "bad shuffle" meme from a long time ago.)

Just how much less work does the enumeration approach require than the iteration approach? Here's a quick snippet to compute the number of derangements and the corresponding Beust number for a given radix:

fac :: Integer -> Integer
fac 0 = 1
fac n = n * (fac (n-1))

choose :: Integer -> Integer -> Integer
choose n k = (fac n) `div` ((fac k) * (fac (n-k)))

d :: Integer -> Integer
d 1 = 0
d 2 = 1
d n = (n-1) * ( d (n-1) + d (n-2) )

b :: Integer -> Integer
b n = sum [ (d k) * (n `choose` k) | k <- [ 1..n ] ]

Then, b10 is 3628799, so a solution like Bob's should have around a 2500-fold advantage over a more brute force method. And that advantage gets huge in short order. As reminder of how much bigger nn is than n! and friends, 2626 is roughly 15,264,691,107 times larger than b26...

(comment bubbles) 0 comments

ElementTraversal == Pig Lipstick

Paul Brown @ 2007-08-28T16:10:00Z

Elliotte's right — the ElementTraversal spec is lipstick on a very ugly pig that's already wearing a good amount of makeup (serialization) and a ridiculous hat (XML namespaces "support"). (I've already given the DOM a few deserved kicks.)

So what does it take to deprecate the DOM? It takes a better API on equivalent licensing terms, as more liberal licenses will tend to trump better software and many of the customers of a better XML API are at the more liberal end of the licensing spectrum, i.e., Apache. I'll second Dan's call to get XOM — or something that sucks as little as XOM does — packaged as a DOM killer, and at least from my perspective, that does not include a JSR or the JCP.

Do I smell bacon? It is, after all, the Year of the Pig...

(comment bubbles) 0 comments

Java Brainteaser on Regular Expressions

Paul Brown @ 2007-03-20T00:32:11Z

Suppose that you have a Java web application where regular expressions are used deep down in the implementation to do some work, but you observe that the an array index exception is occurring sporadically where the regular expressions are being used.

What's causing the exceptions? What's the solution?

(comment bubbles) 1 comment

Posts tagged ["java"] contains 25 items in 4 pages of 7 items each:
1 2 3 4