Sure, it runs like a top. How does it idle?

Paul Brown @ 2006-11-04T03:59:00Z

A couple of weeks back, I wrote a simple but well-instrumented Java framework to handle SEDA-like use cases (thread pools linked by queues) for a consulting customer. The java.util.concurrent package and friends makes this sort of thing much easier than it used to be, and it was surprisingly easy to crank it out. As a smoke test, I set up a torture test for a simple configuration and left it running over night, and it appeared solid — no memory or thread leaks, no lock-ups.

Someone working on a different problem needed something similar and took the framework for a quick test drive, ending up with an out of memory error after a night of doing nothing! It turns out that there was a bug that meant that the poll(long,TimeUnit) on an empty LinkedBlockingQueue leaks, and I'd never run across it in testing because I hadn't tested what happens when the system has no load for an extended period.

The lesson is that no load doesn't mean that the system is actually doing nothing, and it's an important scenario to add to a test plan.

(comment bubbles) 2 comments

First Steps with Haskell for Web Applications

Paul Brown @ 2006-10-11T06:19:00Z

As I blogged yesterday, I'm planning to build a simplified personal publishing system to host this blog, partially to get around resource consumption issues with the current platform and partially to get some exercise with a new language or two. I thought about Smalltalk, Erlang, and Io, but Haskell gets the initial nod if for no other reason than it's a third side of the coin that Ruby and Java are two sides of — rigorously defined, "purely" functional, lazy, "typeful" and compiles to native code via GHC. (And, of course, the syntax warms the cockles of my mathematician's heart.) Like Ruby with gems, the GHC runtime also has excellent modularity, with a minimal and standard core and good package management via Cabal. (Hello? Java?)

The first question is how to integrate an application written in Haskell into a web container, preferably a web server like lightTPD or Apache via FastCGI. (CGI would be a consideration, too, but that's just too retro for me.) Thankfully, as of the forthcoming 6.6 version, GHC has good CGI support via the Network.CGI module, and Björn Bringert has a FastCGI binding that built on the GHC 6.5 tip with only a little tinkering. (I wanted to use the core Network.CGI module in place of Björn's cgi-compat module.)

A "Hello, World" implementation using the FastCGI binding and then compiled to native code performed well on a basic smoke benchmark. Here's the relevant line from top for an instance of the handler:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  234 hello.fcgi   0.0%  0:06.83   1    13    21   692K  1.63M  1.69M  29.0M
[...]

Benchmarking with ab shows that 5 handlers can happily crank through around 4000 requests/second with 99% of the requests requiring <2ms.

For comparison purposes and with an identical FastCGI configuration, the simplest possible Ruby on Rails "Hello, World" implementation (create test controller, edit the .rhtml to return content, wire-up FastCGI) consumes considerably more memory:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  537 ruby1.8     12.1%  0:26.49   1    14    94  22.5M  3.35M  24.5M  54.5M
[...]

and only manages around 100 requests/second with ~50ms response time for the 50th percentile and ~400ms at the 99th percentile. (I recognize that I should probably put a sic. after the "only", since 100 requests/second is significantly in excess of the peak throughput that my blog sees on a good day.)

This is far from apples-to-apples, as the RoR version is doing a lot more work under the covers, but it does give me the expectation that I can probably get a Haskell blog implementation that will have a memory footprint smaller than a base irb and provide Slashdottable performance.

Next up, deciding on how to store/represent an entry and how to implement Atom for syndication.

(comment bubbles) 2 comments

Lo-Fi Profiling of Typo

Paul Brown @ 2006-08-13T06:54:00Z

For anyone who's been wondering why this blog has been up and down over the past week, it's a slow-motion battle between the memory police at TextDrive killing Typo instance that hosts the blog and either a FastCGI dispatcher or a nanny cron job starting it back up. The onus is clearly on me to figure out what's burning memory, and my first inclination was to naively google for Ruby profilers. Here's a rambling account of what I did to conclude that I'm probably out of luck as far as a quick cure for the issues and then to address them.

There are a couple of speed-oriented Ruby performance profilers, the built-in one and ruby-prof, but there are no space-oriented profilers. There was a brute-force approach based on ObjectSpace.each_object in an old mailing list post from Michael Garniss that looked suitable, so I integrated it into the main controller in Typo as an after_filter and fired-up several concurrent wget commands to walk around on a production configuration on my development box at home:

while true; \
do wget -nv -r --delete-after http://localhost:3000; \
done

(There is no reason to try to set it on fire with something like ab.) That won't catch any issues with the vanilla two dispatcher lighttpd/FastCGI configuration that I use on Textdrive, but it should catch any issues with Typo internals, badly behaved sidebars, etc.

With the profiling code integrated, a request that includes the dump takes several seconds to complete, and there are several hits per page; so I added a class variable (@@no_sooner_than) and a little logic so that profiling requests would only run once a minute or so. With several wget walkers working, top reports that the server runs along at a happy 80-90Mb, and eyeballing the profiling output shows memory usage oscillating between <7Mb and ~20Mb without any perceptible upward trend over the course of an hour and a half. (That said, that's all the data I captured, as WEBrick locked up completely after that hour and a half.)

Armed with the information that there wasn't an easy fix for the memory issues, I switched the FastGCI configuration for the production instance to a single dispatcher from the previous two, pointed a couple of wget walkers at it, and tracked memory usage and process id at the commandline, like so:

while true; \
do ps mux | grep ruby | grep -v grep; \
read -t 30; done

I also changed the wget walker command to provide more useful information:

wget -S -r -b -l 4 --delete-after http://mult.ifario.us \
-a /tmp/log_id

where id is a unique number per walker, and so far, so good. Crunching the wget output through shell commands (awk, grep, cut, sort, uniq -c, etc.), e.g.:

cat log* | grep HTTP/1.1 | cut -f 4 -d ' ' | sort | uniq -c

says that mult.ifario.us is consistently returning snappy HTTP/1.1 200 responses about two nines (99.x%) of the time, which isn't great but isn't awful. (Really it's more like 2.5 nines, i.e., −log10(0.003), but who's counting?)

This is one time when I've missed some of the Java runtime environment's capabilities (i.e., the JVMTI) in other language runtimes, but no rocket science was required to get Typo under control.

(comment bubbles) 3 comments