Ahhh... That's Better

Paul R. Brown @ 2008-01-11T21:51:03Z

I cut over to perpubplat about a week ago, so it's worth a quick check to see if I met my goal of getting a few more nines.

On the left, we have an every-30-minutes response time chart for the Typo version of this site, and on the right, the perpubplat version. (The charts are from the free monitoring service from mon.itor.us.) The Typo configuration used Apache2 with mod_fcgid (see my earlier post on the subject) configured for seven processes with a maximum 60-minute lifetime; the perpubplat configuration uses Apache2 with mod_fastcgi configured for a single process that runs on 50 lightweight Haskell threads internally. (N.B.: The graphs use different scales on the y-axis.)

mon.itor.us graph for Typo install of mult.ifario.us mon.itor.us graph for perpubplat install of mult.ifario.us

The graph hints that things are better, but some quick text processing on server logs make the difference more explicit. Here's an analysis of the response codes from the most recent log file for perpubplat:

$ head -n 1 multifarious-combined.log | awk '{print $4}'
[07/Jan/2008:00:32:50
$ ^head^tail
[11/Jan/2008:15:03:50
$ awk '{print $9}' multifarious-combined.log | sort | uniq -c | sort
      1 400
      2 206
     73 304
   1572 302
   2976 404
   5574 301
  14772 200

No 500s. (Most of the 404's are comment spammers trying to hit old URLs for comments.) Here's the same analysis for Typo from a week back in December of last year (2007):

$ zcat mult.ifario.us-access.log.3.gz | head -n 1 | awk '{print $4}'
[10/Dec/2007:19:49:06
$ ^head^tail
[19/Dec/2007:14:28:17
$ zcat mult.ifario.us-access.log.3.gz | awk '{print $9}' | sort | uniq -c | sort
      5 206
      5 400
     23 503
     30 302
    237 404
   1098 301
   2259 500
   9319 304
  20885 200

A change from one-ish nines (-log10 (33861 - 2282)/33861 ~ 1.17) to 100% uptime is a positive change, and the CPU trace for the virtual server suggests that the perpubplat configuration uses a tiny fraction of the machine resources of the Typo configuration.

(comment bubbles) 0 comments

perpubplat 0.9 — You're Looking at It

Paul R. Brown @ 2008-01-03T07:10:14Z

I started thinking about replacing Typo back in October of 2006, and my home-brew project to do so is at the point where it's usable as a replacement. In fact, I cut over this morning, so you're looking at it right now.

The current implementation represents an investment of around 60 hours of learning and hacking time spent on Apache, FastCGI, Atom, XHTML, and Haskell.

Why Not X?

It is reasonable to ask "Why not use X?" where reasonable values for X include default blogging tools like Wordpress or Roller, a more recent version of Typo, some language other than Haskell (like Erlang or OCaml or Scala), or an existing Haskell framework like HApps or Hope. The short answer is that I rolled my own because I wanted to roll my own.

Tooling and Methodology

I used the default tooling stack of Emacs (the Aquamacs flavor) with haskell-mode, GHC, and Darcs for revision control. My workflow loop was equivalently simple: I worked in Emacs until I thought that something might work and then loaded a module into ghci to experiment with.

Basic Architecture

The basic architecture is straightforward; here are the highlights:

  1. Container: FastCGI handler implemented in Haskell that parses request URIs to determine how to respond. Apache with mod_fastcgi is configured as described in an earlier post.
  2. Storage: Plain text, simple file storage of entities (posts, comments, drafts, etc.) in a human editable, human readable format.
  3. Concurrency: An event loop that listens on a Haskell Chan and manages a single in-memory instance of the data. The event loop runs on a lightweight background thread that gets forked when Apache spins-up the FastCGI handler, and the approach is equivalent to the one I used for the sequence number generator experiment.
  4. Browser Views: XHTML rendering via the Text.XHtml combinator package.
  5. Atom Feeds: A modified version of the lightweight Atom library that I posted on previously.

Early Returns and Open Items

My expectations are met so far. The implementation is deployed on my virtual server at Linode. Some ad hoc benchmarking shows that it will support sustained 50 req/sec loads with 10 or 20 concurrent clients without issue, and that's a stark contrast to Typo's performance of four requests per second (downhill and with the wind). (Benchmarking traffic may well be saturating the network between here and there, so it might even do a bit more.) Turning on profiling shows that most of the time is taken in string concatenation for a page view or a feed. One option would be to set up conditional GET and some basic caching, and switching to Data.ByteString would be another.

There are two open items I'm still thinking about: dynamic content in the browser and comments.

For dynamic content in the browser, i.e., integration via Javascript clients to HTTP APIs, I experimented a bit with in-page widgets that do a bit of DOM rewriting to display fancy badges from del.icio.us or Reddit or shared items from Google Reader, but pages load more slowly, it requires clients to have Javascript enabled, and it's incompatible with XHTML. (The one compromise for the moment is the Flickr montage, since it adds a splash of color.) Instead, I'm planning to add additional background threads to the application to poll del.icio.us, Flickr, and other services via HTTP APIs and then vend cached data to sidebar widgets.

The second item is support for comments, trackbacks, referrers, and the like, and that's just a matter of me deciding how I want to manage the workflow and ensure a good experience for repeat commenters or correspondents (i.e., people who link here).

Open Source?

I will make source code available as a Darcs repository at some point in the near future, but it's not a priority for me. The milestone I'd like to achieve before releasing the source is to have the cabal builds all squeaky clean, and that's not far off. (Right now, things are just built with ghc --make.) Even then, I still wouldn't call it "open source". Calling something "open source", to me, should carry with it an implicit promise of usefulness and fitness for purpose, but as the name might suggest, I intend this to be a personal publishing platform.

(comment bubbles) 0 comments

Administrivia

Paul R. Brown @ 2008-01-02T20:03:19Z

I've switched my publishing engine from Typo to a home-brew platform called "perpubplat" which appears to be stable enough (i.e., better than Typo) but will remain a work in progress. (I'll post more on perpubplat later; I'm happy to share source code with folks and will post Darcs repository information once I've cleaned things up a bit.)

Of relevance to feed subscribers (and perhaps how you ended up here):

  • RSS feeds are no longer provided in any form. What would have been a request for an RSS feed directs to an Atom feed containing this post.
  • As a corollary of the non-existence of RSS, categories (as semantic artifacts of RSS) no longer exist. Instead, you can pick out a relevant by-tag feed, e.g., for posts tagged Haskell, Java, or entrepreneurship. (For any by-tag or by-date view, there is an associated feed available via autodiscovery.)
  • Comment feeds by article internal identifier are no longer supported. (Comment feeds by article are supported with a different, permatitle-based URI; check the autodiscovery links on a single article page.)

Of relevance to visitors:

  • For the time being, new comments aren't supported directly on this blog, but that shouldn't stop you from commenting on your own blog or posting to a community site like Reddit or DZone. The plumbing for comments (as well as trackbacks and backreferences) is present, but I haven't decided how I want to restrict content and control spam yet. Historical comments and trackbacks are present.
  • I've made some effort (i.e., using FeedValidator and xmllint) to ensure that specification compliance is provided when advertised, including cleaning up the content of older posts, but please let me know if something falls short.
(comment bubbles) 0 comments

A Little Life Support for Typo

Paul Brown @ 2007-09-13T04:31:00Z

I've been busy, but I haven't been so busy that I couldn't find a little time to blog. I have been busy enough that I couldn't find time to diagnose why Typo would give me an HTTP 500 after a very long pause whenever I tried to post. I finally found the time, and a little poking around was enough to suggest that 250k rows in the sessions table was a bit much for the way I have MySQL configured.

The bloated sessions table was the result of the default configuration that uses the database to manage sessions. Truncating the table plus a switch over to memcached for session management (as per Err) have me back to a usable configuration.

(comment bubbles) 0 comments

Damn Bellybutton

Paul Brown @ 2007-04-19T02:15:08Z

I don't look at my real world belly button all that often, and realistically, if it's not there, it's not a big deal; I was done with it a long time ago. On the other hand, when I do indulge in a little metaphorical navel gazing, I'll be damned if my weblog isn't missing about half the time. Again, it's not a big deal, since FeedBurner keeps the fire burning for any subscribers, but it's still annoying if not a bit embarrassing.

Back when I started having trouble with typo, I signed up for an account on mon.itor.us to get an idea of how much of a problem it was, and it's been pretty boring since I moved it off of TextDrive . As configured and hosted, it provides "one-and-a-half nines" or around 95% uptime, except when it really sucks wind. Speaking of which, here are the response time graphs for the 16th-18th:

The period of about two days out of the three where the graph is pegged at the top represent effective downtime. Flickr — or rather the lightweight integration of Flickr into typo — was the culprit, again, but I opted to disable it rather than fix it.

(comment bubbles) 0 comments

JSON as a Migration Format

Paul Brown @ 2007-02-22T16:11:12Z

I'm making slow progress on my personal publishing platform rewrite in Haskell (see earlier posts here and here), so herein part 3 of n, wherein I experiment with data migration and an embarrassingly simple data model. A forthcoming part 4 will be really simple Atom serialization.

Data Out, Data In

As I experiment with the new platform, I'd like a way to move the data from the typo instance into the new environment on-demand; this post is my lab notebook for the export/import experiment.

One of the things that Rails has gotten 100% right is the ability to (easily) access configured environments via interactive (script/console) or scripting (script/running) front-ends. (Using a framework like Spring in the Java space can provide similar functionality by constructing an application context, but it's more awkward to separate out the services that the runtime container would be providing.) My first thought on exporting was to use YAML, but the significant whitespace and cryptic annotations ("|" for a free-form text block with a trailing linebreak and "|-" for a free-form text block without a trailing linebreak) just rubbed me the wrong way. JSON turns out to be a better choice because ActiveRecord supports JSON export (via ActiveSupport::JSON), and there are a couple of JSON libraries for Haskell. One is a predecessor version of the other, and I'm going to work with the earlier version because it has no dependencies other than a baseline GHC 6.6 install.

Getting an entry out to play with is a piece of cake:

./script/runner 'puts Article.find_by_state("Published").to_json' \
  > /tmp/entry.json

Parsing the JSON is similarly straightforward:

$ ghci
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
Prelude> :load /tmp/JSON.hs
[1 of 1] Compiling JSON             ( /tmp/JSON.hs, interpreted )
Ok, modules loaded: JSON.
*JSON> entry <- P.parseFromFile json "/tmp/entry.json"
Loading package parsec-2.0 ... linking ... done.
Right (Object (fromList [("attributes",Object (fromList [
[...]

(JSON.hs aliases Data.Map as M and Text.ParserCombinators.Parsec as P, so that's where the P.parseFromFile is coming from.) The map of values is wrapped up a bit, but a couple of simple functions will get it out from behind the type constructors:

*JSON> let unR = \(Right r) -> r
*JSON> let unO = \(Object o) -> o
*JSON> :t (unO.unR) entry
(unO.unR) entry :: M.Map String Value

which gets us down to the level of the first map with one entry under the key "attributes". To get the map of attributes we want out:

*JSON> let m = unO ((((M.!).unO.unR) entry) "attributes")
*JSON> m
fromList [("allow_comments",String "1"),("allow_pings",String "1"),...

(Haskell uses ! for dereferencing keys in a Data.Map.) And now the components of the entry are easy to extract:

*JSON> let atts = ((M.!) m)
*JSON> atts "allow_comments"
String "1"
*JSON> atts "updated_at"
String "2006-09-15 02:12:45"
*JSON> atts "body" 
String "<p>Although I really do like the...

The first-cut data model for entries looks like this:

data BlogPost = BlogPost { p_title :: String,
               p_summary :: Maybe String,
               p_permalink :: String,
               p_metadata :: PostMetadata,
               p_body :: String,
               p_tags :: [String],
               p_uid :: String,
               p_comments :: [BlogPost]
             }
        deriving (Show)

data PostMetadata = PostMetadata { m_created :: CalendarTime,
                   m_publish :: CalendarTime,
                   m_updated :: CalendarTime,
                   m_author :: PostAuthor,
                   m_published :: Bool }
          deriving (Show)

data PostAuthor = PostAuthor { p_name :: String,
                   p_uri :: Maybe String,
                   p_email :: Maybe String,
                   p_showEmail :: Bool
                 }
          deriving (Show)

And interpolating from typo's model to the new model is just putting the fields in the right place with a little bit of date munging, since the new model has the expectation that dates are represented as Haskell CalendarTime. The reuse of the BlogPost structure for comments is intentional, both for Atom syndication and to support threaded comments.

Pulling all of the entries out is also straightforward:

$ ./script/runner 'puts (Article.find_all_by_state("Published")).to_json' \
  > /tmp/entry.json
$ ./script/runner 'puts (Article.find_all_by_state("ContentState::Published")).to_json' \
  >> /tmp/entry.json

and pulling comments and trackbacks is a similar exercise:

$ ./script/runner 'puts (Comment.find_all_by_state("ContentState::Ham")).to_json' \
  > /tmp/comments.json
$ ./script/runner 'puts (Trackback.find_all_by_state("ContentState::Ham")).to_json' \
  > /tmp/trackbacks.json

although it takes a little doing to collate comments and trackbacks with their parent posts. So far, so good — unlike any of the other migrations (Radio Userland→SnipSnap, SnipSnap→WordPress, and WordPress→typo) I've done, this looks to be neither lossy nor labor-intensive.

As an aside, over about four years of blogging (2003-02-17 through the present), I've accumulated the equivalent of ~110 single-spaced pages of content.

(comment bubbles) 1 comment

Cure for irbarrhea

Paul Brown @ 2007-02-22T00:19:14Z

I've been working with the ActiveRecord model for my typo installation using irb (via scripts/console), but some expressions were producing a little too much output for convenience, e.g.:

Article.find_all.collect { |a| a.permalink }

Fortunately, there is a convenient configuration property that does the trick by suppressing the output of evaluated expressions:

conf.return_format = ""

After which, what you get is what you puts.

(comment bubbles) 0 comments

Posts tagged ["typo"] contains 17 items in 3 pages of 7 items each:
1 2 3