First Steps with Haskell for Web Applications

Paul Brown @ 2006-10-11T06:19:00Z

As I blogged yesterday, I'm planning to build a simplified personal publishing system to host this blog, partially to get around resource consumption issues with the current platform and partially to get some exercise with a new language or two. I thought about Smalltalk, Erlang, and Io, but Haskell gets the initial nod if for no other reason than it's a third side of the coin that Ruby and Java are two sides of — rigorously defined, "purely" functional, lazy, "typeful" and compiles to native code via GHC. (And, of course, the syntax warms the cockles of my mathematician's heart.) Like Ruby with gems, the GHC runtime also has excellent modularity, with a minimal and standard core and good package management via Cabal. (Hello? Java?)

The first question is how to integrate an application written in Haskell into a web container, preferably a web server like lightTPD or Apache via FastCGI. (CGI would be a consideration, too, but that's just too retro for me.) Thankfully, as of the forthcoming 6.6 version, GHC has good CGI support via the Network.CGI module, and Björn Bringert has a FastCGI binding that built on the GHC 6.5 tip with only a little tinkering. (I wanted to use the core Network.CGI module in place of Björn's cgi-compat module.)

A "Hello, World" implementation using the FastCGI binding and then compiled to native code performed well on a basic smoke benchmark. Here's the relevant line from top for an instance of the handler:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  234 hello.fcgi   0.0%  0:06.83   1    13    21   692K  1.63M  1.69M  29.0M
[...]

Benchmarking with ab shows that 5 handlers can happily crank through around 4000 requests/second with 99% of the requests requiring <2ms.

For comparison purposes and with an identical FastCGI configuration, the simplest possible Ruby on Rails "Hello, World" implementation (create test controller, edit the .rhtml to return content, wire-up FastCGI) consumes considerably more memory:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  537 ruby1.8     12.1%  0:26.49   1    14    94  22.5M  3.35M  24.5M  54.5M
[...]

and only manages around 100 requests/second with ~50ms response time for the 50th percentile and ~400ms at the 99th percentile. (I recognize that I should probably put a sic. after the "only", since 100 requests/second is significantly in excess of the peak throughput that my blog sees on a good day.)

This is far from apples-to-apples, as the RoR version is doing a lot more work under the covers, but it does give me the expectation that I can probably get a Haskell blog implementation that will have a memory footprint smaller than a base irb and provide Slashdottable performance.

Next up, deciding on how to store/represent an entry and how to implement Atom for syndication.

(comment bubbles) 2 comments

Typo + TextDrive != Happy

Paul Brown @ 2006-10-10T06:18:58Z

The logs say that mult.ifario.us throws a fair number of HTTP 500 response codes back at visitors, and that's sad. It is certainly not the impression I want to make on visitors and readers (although subscribers are insulated from failures by FeedBurner's excellent service). In a perfect world, something as simple as a weblog wouldn't throw any 500s, ever. The problems come from running Typo on TextDrive. There isn't anything intrinsically wrong with the Typo engine, with Ruby, or even with TextDrive, as a similar setup runs like a top in my test environment, but TextDrive's resource limits make Typo's design impractical.

This got me thinking about the design of the simplest possible weblog publishing software, a design that would eschew the use of a database and all runtime configuration in favor of a system that is ultra-lightweight and quick to "boot". Almost all of the content in the blog is relatively static — display of an entry, feeds, archives, various paginations and groupings only require lightweight decoration of the XHTML for a given entry. Paginations and groups, e.g., by multiple tags or by tags plus date, require some dynamic behavior on the server, but not that much. A complexity-ectomy doesn't have to come at the expense of chrome and eye candy, as modern browsers make it possible to inject dynamic content (images from Flickr, links from del.icio.us, free-associations from Google AdSense, etc.) into the browser directly in the form of JavaScript.

The one difficult bit (and the only thing that would require a POST) would be comments. Comments don't need a database or use of dynamic content, either, and using email for comment workflow would solve multiple problems. Here's a sketch:

  • Comment is made on the weblog by submitting a form.
  • Server-side executable wraps the comment as an email and sends it to the blog's author.
  • Normal email filtering machinery is applied to the comment, i.e., spam filtering, and the blog content author either chooses to reply to the message, in which case the comment is added to the relevant entry (e.g., via a procmail recipe), or simply ignores it.

Akismet is apparently effective (if, at the same time, a statement about the sad state of the signal-to-noise ratio of the present-day internet), but it makes sense to leverage the filtering technology and massive corpus (~107 messages) of SPAM and ham that I already use for email.

I've experimented with different publishing platforms (Radio Userland, SnipSnap, MT, WordPress, Typo), and they all fell short for me in one way or another.

As the saying goes, if you want something done right... I'm going to embark on a project to replace Typo with something simple, dense/terse, and home-grown. It's also a chance to experiment with a new language or two, so it should be both fun and educational. Java's out due to footprint, but my mind is open otherwise — SmallTalk, Haskell, Lisp, Io, ...?

(comment bubbles) 1 comment