Seems Clear Enough from the Specification

Paul R. Brown @ 2008-08-12T18:40:35Z

I can understand how people get confused with RSS since what passes for a specification isn't exactly precise, but the Atom syndication spec (a.k.a. RFC 4287) is nice work and comes with an unambiguous RelaxNG grammar. Which is why I find this (sent by a friend who reads this blog in Google Reader) irritating:

mis-rendered title

This is wrong. (For what it's worth, this doesn't happen for me in Google Reader running in Firefox 3.0.1.) Extracting a bit of the grammar, the title of the feed is an atom:title element, and XHTML content is supported:

atomTitle = element atom:title { atomTextConstruct }

anyXHTML = element xhtml:* {
   (attribute * { text }
    | text
    | anyXHTML)*
}

atomPlainTextConstruct =
   atomCommonAttributes,
   attribute type { "text" | "html" }?,
   text

atomXHTMLTextConstruct =
   atomCommonAttributes,
   attribute type { "xhtml" },
   xhtmlDiv

atomTextConstruct = atomPlainTextConstruct | atomXHTMLTextConstruct

xhtmlDiv = element xhtml:div {
   (attribute * { text }
    | text
    | anyXHTML)*
}

So the construct from my feed is valid:

<title type="xhtml">
  <div xmlns="http://www.w3.org/1999/xhtml">mult.ifario.us - All Posts</div>
</title>

I spent time with the excellent and useful Feed Validator when I was writing perpubplat, even compromising on things that I felt I shouldn't have to compromise on, e.g., namespace prefixes, in the name of interoperability. The validator awards the feed a squeaky clean bill of health. Nonetheless, this issue makes my feed ugly in Google Reader, since the UI shows an abbreviation like <div xmlns="... in the sidebar and elsewhere. (FWIW, this Google Reader issue isn't caught by the title conformance tests.)

I've posted a bug in the relevant Google Group here.

(comment bubbles) 0 comments

New Features for Perpubplat and Ruminations on Service APIs for the Web

Paul R. Brown @ 2008-02-18T20:02:39Z

I've added some new features to perpubplat, and each one presented a nice exercise in Haskell, working with Haskell libraries, and the design and consumption of web APIs.

Collage of Random Flickr Photos

Flickr Sidebar screenshotThe first feature is the collage of photos that uses the Flickr JSON API. The collage appears at the bottom of the sidebar under the "Photos" heading.

The implementation of the collage (Blog.Widgets.FlickrCollage; source here) uses a polite (i.e., supports conditional GET) HTTP poller (Blog.BackEnd.HttpPoller; source here) to call flickr.people.getPublicPhotos (docs here) every fifteen minutes and pull down the data for my most recent 500 photos. (I'll discuss the HTTP poller below.) To deal with concurrency — many readers (HTTP requests) and one writer (the polling thread) — an MVar holds the list of photos, with the writer taking the old value and putting the new and the reader taking the old value and then putting it right back. The implementation of MVar ensures that waiters are awakened in FIFO order, so this should (and does) work great.

The JSON parser that I've been using uses Haskell's datatype polymorphism to model polymorphism in JSON, and this means that you work with wrapped (JSON Array wrapped around a list, JSON String wrapped around a Haskell String, etc.) primitive values instead of primitive values. To make things a little more ergonomic, I've bundled up some one-line utility functions in Blog.Widgets.JsonUtilities (source here). My favorite of the bunch is </>:

(</>) :: J.Value -> String -> J.Value
(J.Object o) </> s = o M.! s
(J.Array a) </> s = J.Array $ map (flip (</>) $ s) a

This makes it possible to compactly express access to nested JSON objects. For example, from the Flickr integration:

to_photo :: J.Value -> FlickrPhoto
to_photo m = FlickrPhoto { photo_id = uns $ m </> "id"
                         , owner = uns $ m </> "owner"
                         , secret = uns $ m </> "secret"
                         , server = uns $ m </> "server"
                         , photo_title = uns $ m </> "title"
                         , farm = unn $ m </> "farm" }

The uns function pulls the value out of a wrapped JSON String, and the unn function pulls the value out of a wrapped JSON Number. With a bit more thought, someone could probably come up with a nice library for JSON handling along the lines of Jaql or something like Pig Latin.

HTTP Polling

My rough cut at an HTTP polling library built on top of Network.HTTP is Blog.BackEnd.HttpPoller (source here), and it supports the bare minimum of features that I needed:

  • Call a supplied function with signature String -> IO () with the body of a 200 response and ignore others.
  • Use "conditional GET" (RFC 2616, section 9.3) via ETag/If-None-Match and Last-Modified/If-Modified-Since.
  • Support for basic authentication via a header configured on the template request passed to the poller.
  • Tolerant of temporary failures but able to gracefully exit.
  • Detailed-enough logging in case APIs, endpoints, or policies change. (I omitted redirect support on purpose.)

del.icio.us Bookmarks on an Entry

The second feature is integration with del.icio.us bookmarks pointing to an entry via the del.icio.us JSON API, and it shows up as a trailer on entries in the detail view:

del.icio.us entry trailer screenshot

I've already blogged about most of the interesting stuff from integrating with the del.icio.us JSON API using Network.HTTP; see Haskell, del.icio.us, and JSON (encodings and non-standard JSON) and A Short Adventure with simpleHTTP (unclosed sockets).

The part I didn't cover was how to schedule queries against del.icio.us, and I'll probably go back to both simplify and enhance it. As present, it's a bit convoluted; three threads interact as follows:

  1. The driver triggers the scheduler on a fixed interval.
  2. The scheduler manages an ordered list of scheduled times and entries. In response to a trigger from the driver, if the head of the list is past due, the scheduler pops the head of the list, refreshes the data about bookmarks for that entry, sends it to the controller, and schedules the next refresh for that entry based on its age in days. The scheduler also receives information about new entries and adds them to the schedule.
  3. The controller manages a Data.Map of data about bookmarks per entry and either updates data in response to the scheduler or returns the current data for rendering a response.

The current design is in-memory only, so it gets repopulated each time the service is booted. I intend to add simple file-based persistence along the same lines used for entries and comments. The other major missing features are support for conditional GET and grouping requests into groups of 15, as allowed by the del.icio.us API.

I would have liked to use the delicious API, but Network.HTTP doesn't currently support HTTPS.

Personal Aggregation

StreamOfConsciousness Sidebar screenshotThe third feature is aggregation of my del.icio.us bookmarks (via RSS feed), Google Reader shared items (via Atom feed), and Twitter "tweets" (via JSON API). The aggregated flotsam, jetsam, dross, and detritus shows up in the sidebar under the "Stream of Consciousness" heading in the sidebar.

The feature is a bit like Moveable Type's Action Streams plugin, but the perpubplat implementation benefits from the fact that a Haskell FastCGI application can have background threads (so no crontab hacking).

The implementation is in the Blog.Widgets.StreamOfConsciousness.* modules:

  • Thought is a data structure that represents a tweet, post, shared item, etc. — date, link, content.
  • Twitter, GoogleReader, and DeliciousPosts encapsulate access to the respective services and parsing data into lists of Thoughts. Each worker uses an HTTP poller (same as with the Flickr collage) to poll a feed.
  • Controller manages the aggregate data structure and a pre-rendered HTML fragment.

To handle the multiple writers and multiple readers, I implemented a lightweight version of multi-version concurrency control where readers can always get data but writers may have to repeat a computation if someone else updated the data in the meantime. Here's a fragment from B.W.S.Controller (full source here):

commit :: SoCController -> [Thought] -> IO ()
commit socc new_items =
    do { snap <- get_data socc
       ; let items' = take (max_size snap) $ merge new_items $ items snap
       ; let rendered' = thoughts_to_xhtml items' 
       ; let snap' = snap { items = items'
                          , rendered = rendered' }
       ; ok <- update socc snap'
       ; if ok then
             return ()
         else 
             do { threadDelay collision_delay
                ; commit socc new_items }
       }

loop :: Chan SoCRequest -> Snapshot -> IO ()
loop ch snap = 
    do { req <- readChan ch
       ; snap' <- case req of
                   GetHtmlFragment c ->
                       do { putMVar c $ rendered snap
                          ; return snap }
                   GetData h ->
                       do { putMVar h snap
                          ; return snap }
                   Update ok snap'' ->
                       if (version snap) == (version snap'') then
                           do { putMVar ok True
                              ; let snap' = snap'' { version = (version snap) + 1 }
                              ; return snap' }
                       else
                           do { putMVar ok False
                              ; return snap }
       ; loop ch snap' }

The commit function runs in the HTTP polling thread doing the updating, and it's responsible both for merging the items into the sorted data and for updating the HTML representation that will get handed to the page rendering process.

The other interesting nut to crack was extracting data from XML using Haskell. I could have used the del.icio.us JSON feed and the JSON feed that the Google Reader shared items Javascript widget uses, but those lack the timestamps that I need to fold the streams together.

Extracting Data from RSS and Atom

I followed the standard trail for learning HXT, which involves building from source, reading the gentle introduction, and trying some of the practical examples. The only issue I had was with namespace handling.

Here's a code fragment from B.W.S.DeliciousPosts (source here) to read the RSS feed of my del.icio.us bookmarks:

import Text.XML.HXT.Arrow

handle_posts :: SoCController -> String -> IO ()
handle_posts socc body = do { posts <- runX ( readString parse_opts body >>> getItems )
                            ; commit socc posts }

parse_opts = [(a_validate, v_0), (a_check_namespaces,v_1)]
                                
atElemQName qn = deep (isElem >>> hasQName qn)
text = getChildren >>> getText
textOf qn = atElemQName qn >>> text

rdf_uri = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdf_RDF = QN "rdf" "RDF" rdf_uri

rss_uri = "http://purl.org/rss/1.0/"
rss_item = QN "rss" "item" rss_uri
rss_title = QN "rss" "title" rss_uri
rss_link = QN "rss" "link" rss_uri

dc_uri = "http://purl.org/dc/elements/1.1/"
dc_date = QN "dc" "date" dc_uri


getItem = atElemQName rss_item >>>
          proc i -> do
            t <- textOf rss_title -< i
            u <- textOf rss_link -< i
            d <- textOf dc_date -< i
            returnA -< Thought Delicious d u t

getItems = atElemQName rdf_RDF >>>
           proc r -> do
             items <- getItem -< r
             returnA -< items

HXT uses arrow notation; the quick and dirty explanation is that proc is like λ (but for arrows instead of functions), the <- is the usual monadic "bind" operator, and the -< feeds a value to the expression on the shaft of the arrow.

The first time I ran this against the RSS from del.icio.us, I got nothing back, so after looking at the XML for the RSS, I switched the prefix for the RSS QNames to the empty string to match the input file, and it worked. Grrr... That means that the (==) for QName is broken, and a quick look at the source in Text.XML.HXT.DOM.TypeDefs showed why:

data QName = QN { namePrefix    :: String
ualified name \"namePrefix:localPart\"
                , localPart     :: String
ed name \"namePrefix:localPart\"
                , namespaceUri  :: String
i
                }
             deriving (Eq, Ord, Show, Read, Typeable)

The derived (==) will just and together the (==) for the three components (prefix, local, uri), but XML QNames are equal if their local parts and URIs (as strings) are equal. It's easy to fix by dropping the derivation of Eq and supplying a good version:

-              deriving (Eq, Ord, Show, Read, Typeable)
+              deriving (Ord, Show, Read, Typeable)
+ 
+ instance Eq QName where
+     q1 == q2 = ((localPart q1) == (localPart q2))
+                && ((namespaceUri q1) == (namespaceUri q2))

After which, it works according to my expectations for namespace handling.

Couldn't You Do All That With JavaScript...?

Yes. I could. I didn't. Here are a few of the reasons that I chose not to:

  • My experiments showed that page loads would be several seconds instead of a fraction of a second. Other people have had the same experience. (It reminds me of the opening scene of I'm Gonna Git You Sucka where Junebug dies of an OG. Don't let your blog die of an OW...)
  • Some of the widgets are just plain fugly, IMHO. I'm looking at you, Google Reader shared item "clip" and Twitter Flash widget, although the availability of JSON for the Google Reader shared item "clip" (look in the JavaScript) and Twitter would allow me to come up with something more pleasing (to me).
  • Even though it's not a good idea — e.g., IE7 is broken, Firefox <3 doesn't do incremental display, etc. — I would like to be able to serve application/xhtml+xml, and document.write doesn't work.
  • The availability of background threads on the server side means that Javascript on the client side isn't the only option.

Other Integrations and Aggregations

The other two features that I'd like to add are backlinks to other blogs and backlinks to posts on community sites like Reddit and DZone. (I'm on the fence about implementing trackback support; you could twist my arm.)

Nonetheless, I'm on the fence about directing people to comment threads in other locations, i.e., Reddit. (My reasons are similar to Reg Braithwaite's.) It would be a simple matter to sniff referring URLs, deduce where an entry is posted on Reddit, and then integrate the comments together, but Reddit's draconian User Agreement forbids it:

The content, organization, graphics, text, images, video, design, compilation, advertising and all other material on the Website, including without limitation, the "look and feel" of this website, are protected under applicable copyrights and other proprietary (including but not limited to intellectual property) rights and are the property of Website Provider or its licensors. The copying, rearrangement, redistribution, modification, use or publication by you, directly or indirectly, of any such matters or any part of the website, including but not limited to the removal or alteration of advertising, except for the limited rights of use granted hereunder, is strictly prohibited.

Someone should implement a community hub that integrates discussion threads, followup posts, and blog comments on an original entry in a transparent and open fashion...

Postmortem

My first observation from this experiment is that APIs are preferable to feeds are preferable to widgets when it comes to integration of services on the web. (Note that I didn't say web serivces...) Even listing widgets is somewhat questionable in my opinion, since it's more of a "put my stuff on your page" than a "use my service".

My second observation is nothing new, but I now have experimental evidence — JSON is preferable to XML, whether or not the target client runs in a browser. If I were building a service, I'm not sure that I'd bother with supporting an XML API.

My third observation is that I would use Haskell to build a product or service, and I mean that in the sense that I can see how to train a team and build processes (prototyping, implementation, quality, deployment, support) around Haskell. The language does have a relatively steep learning curve (q.q. Kevin Scaldefarri's post on the subject and the comments that follow or Reg Braithwaite's general ruminations on learning languages), but the real problem is collectively getting through the challenges once. It reminds me of learning spectral sequences as a graduate student; fifteen minutes with my advisor to work an example was better than a week of staring at otherwise incrutable notation. As a measure of the view from my current location on the learning curve, I coded up a working rough cut of the "stream of consciousness" feature in an evening plus an afternoon cup of coffee, and I wouldn't regard myself as being fully around the curve yet (FFI, custom monads/transformers, etc. await).

(comment bubbles) 1 comment

Really Simple Atom Syndication

Paul Brown @ 2007-02-27T04:06:00Z

Herein post 4 of n on my hobby project to rewrite my personal publishing software in Haskell. Herein, I create a economically-driven approach to Atom syndication format for entries and comments. By citing economics as the prime motivator, I mean that I'm aiming neither for the most complete nor the most elegant implementation except where those two overlap with exigency.

Required Reading

To make sure that I knew enough about Atom, I read through the Atom Syndication Format RFC (I prefer the plain text to the pretty version), the introduction at AtomEnabled.org, and Mark Pilgrim's note on How to make a good ID in Atom. After reading so many specifications that either use flabby XML Schema (like WSDL) or ad hoc XML (like RSS 2.0), the use of RELAX NG compact syntax in the RFC was a breath of fresh air.

Really Simple Atom Data Model

The first set of design decisions I made were what to throw out from Atom. I decided to omit the atom:contributor construct entirely as well as atom:author is sufficient for my purposes, and atom:source and attributes on atom:link other than @rel and @href weren't going to be any use to me, either. I decided to omit the @scheme and @label attributes on atom:category since all of my @term values will be human readable and don't plan on using any scheme other than my own. I decided to omit any specific constraints on components that might otherwise be URIs, RFC3339-formatted date/time, or other — String will do for now, and I'll make sure that properly formatted data (including escaping as necessary) is used in the first place. I also decided to leave any of the sequencing and quantity constraints out of the model, as this will be an internal model only.

Here's the way it looks, and if you squint at it just right, it doesn't look that different from the RELAX NG compact schema:

data AtomElement = Feed [AtomElement]
                 | Entry [AtomElement]
                 | Content AtomContent
                 | Author { author_name :: String,
                            author_uri :: Maybe String,
                            author_email :: Maybe String }
                 | Category String 
                 | Generator { gen_name :: String,
                               gen_uri :: String,
                               gen_version :: String }
                 | Id String
                 | Icon String
                 | Link { rel :: String,
                          href :: String }
                 | Logo String
                 | Published String
                 | Rights AtomContent
                 | Subtitle AtomContent
                 | Summary AtomContent
                 | Title AtomContent
                 | Updated String
                   deriving (Show)

The Maybe String for the author_uri and author_email components of the Author representation are intended to allow for comments where the author may omit an email address or link. (Of course, I may just omit their comment under those circumstances...) Next, one more type for AtomContent, where I elected to eliminate the possibility of HTML content:

data ContentType = XHTML | TEXT
                 deriving (Eq,Show,Enum)

data AtomContent = AtomContent { contentType :: ContentType,
                                 body :: String }
                 deriving (Show)

XML Output

With a few (non-limiting) assumptions, getting XML out is simple. First up, the Atom URI, my choice to bind it to the atom prefix and my assumption that XHTML will always in the default namespace:

_prefix :: String
_prefix = "atom"

_uri :: String
_uri = "http://www.w3.org/2005/Atom"

start_div :: String
start_div = "<div xmlns=\"http://www.w3.org/1999/xhtml\">"

end_div :: String
end_div =  "</div>"

It's worth noting that the XHTML specification (via either the transitional DTD or the strict DTD) requires that the XHTML namespace be the default namespace, but there is no requirement that an XHTML fragment in an Atom feed use the default namespace.

Next, some really simple functions to wrap content in elements:

-- Format a clopen element with a list of attributes.
clopen :: String -> [(String,String)] -> String
clopen s [] = "<" ++ (prefix s) ++ "/>"
clopen s xs = "<" ++ (prefix s) ++ (nv_to_s "" xs) ++ "/>"

-- Wrap a string in an element.
wrap :: String -> String -> String
wrap s t = "<" ++ (prefix s) ++ ">" ++ t ++ "</" ++ (prefix s) ++ ">"

-- If a value is present (i.e., not Nothing), wrap it in an element.
wrap_m :: String -> Maybe String -> String
wrap_m _ Nothing = ""
wrap_m s (Just t) = wrap s t

-- Wrap an element with attributes around a string.
wrap_ :: String -> [(String,String)] -> String -> String
wrap_ s [] t = wrap s t
wrap_ s xs t = "<" ++ (prefix s) ++ (nv_to_s "" xs) ++ ('>':t)
               ++ "</" ++ (prefix s) ++ ">"

wrap_ns :: String -> String -> String
wrap_ns s t = wrap_ s [(_prefix,_uri)] t

-- Format a list of name-value pairs as attributes.
nv_to_s :: String -> [(String,String)] -> String
nv_to_s = foldl att

att :: String -> (String,String) -> String
att s (n,v) = s ++ (' ':(n ++ "=\"" ++ v ++ "\""))

And then just map the various shades of AtomElement onto the functions:

toXml :: AtomElement -> String
toXml (Feed xs) = wrap_ns "feed" (content_ xs)
toXml (Entry xs) = wrap_ns "entry" (content_ xs)

toXml' :: AtomElement -> String
toXml' (Entry xs) = wrap "entry" (content_ xs)
toXml' (Category s) = clopen "category" [("term",s)]
toXml' (Id s) = wrap "id" s
toXml' (Icon s) = wrap "icon" s
toXml' (Link r h) = clopen "link" [("rel",r),("href",h)]
toXml' (Logo s) = wrap "logo" s
toXml' (Published s) = wrap "published" s
toXml' (Updated s) = wrap "updated" s
toXml' (Author s u e) = wrap "author" ((wrap "name" s)
                                       ++ (wrap_m "uri" u)
                                       ++ (wrap_m "email" e))
toXml' (Generator n u v) = wrap_ "generator" [("uri",u),("version",v)] n
toXml' (Content a) = atom_text "content" a
toXml' (Rights a) = atom_text "rights" a
toXml' (Subtitle a) = atom_text "subtitle" a
toXml' (Summary a) = atom_text "summary" a
toXml' (Title a) = atom_text "title" a

content_ :: [AtomElement] -> String
content_ = concat.(map toXml')

-- Render an Atom text construct as XML.
atom_text :: String -> AtomContent -> String
atom_text s (AtomContent XHTML t) = wrap_ s [("type","xhtml")] (start_div ++ t ++ end_div)
atom_text s (AtomContent TEXT t) = wrap s t

(The Atom spec allows @type="text" to be omitted.) The toXml function and the AtomElement, AtomContent, and ContentType types are all that would be exported from the module.

A quick check with ghci shows that this does the right thing:

[...]
*Text.Atom> let entry = Entry [Title (AtomContent TEXT "Atom-Powered Robots Run Amok"), Id "urn:uuid:foo", Updated "2003-12-12T18:30Z", Author John Doe" Nothing Nothing, Content (AtomContent XHTML "

Some text.

")] *Text.Atom> toXml entry "<atom:entry atom=\"http://www.w3.org/2005/Atom\"><atom:title type=\"text\">Atom-Powered Robots Run Amok</atom:title><atom:id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</atom:id><atom:updated>2003-12-12T18:30Z</atom:updated><atom:author><atom:name>John Doe</atom:name></atom:author><atom:content type=\"xhtml\"><div xmlns=\"http://www.w3.org/1999/xhtml\"><p>Some text.</p></div></atom:content></atom:entry>"

The let entry=... line makes more sense with some whitespace thrown in:

let entry = Entry [
  Title (AtomContent TEXT "Atom-Powered Robots Run Amok"),
  Id "urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a",
  Updated "2003-12-12T18:30Z",
  Author "John Doe" Nothing Nothing,
  Content (AtomContent XHTML "<p>Some text.</p>")
]

Other Available XML Wheels

While the above is a small wheel, it is a wheel nonetheless, and I looked at three Haskell XML libraries before reinventing it:

  • Haskell XML Toolbox, a.k.a., HXT, (link) appears to be under active development and supports my basic requirements of XML output and namespace support. The API looks agreeable, and there is RSS aggregator in 50 lines as an example. If I choose to implement the Atom Publishing Protocol, HXT is the way I'll go to get atom:entry turned into the right kind of internal structure.
  • HaXml (link) appears to lack namespace support, so I dismissed it without looking deeply at it.
  • HXML (link) lacks namespace support, so I dismissed it without looking deeply at it. That said, the validation concept in HXML has the same heritage as the one used in the RELAX NG validator Jing.

What's Left?

There's enough real work left for at least three more blog entries: storage/state management for entries (probably STM with persistence via the filesystem), a commenting facility, human-facing content display and navigation (probably Haskell via the Text.XHtml package), and making sure that the FCGI wrapper works well multi-threaded. (I want a multi-threaded FCGI handler so that STM can serve as the concurrency control for the application; otherwise, the persistence layer will need to provide that functionality.)

State's the next one I'll tackle.

(comment bubbles) 2 comments

First Steps with Haskell for Web Applications

Paul Brown @ 2006-10-11T06:19:00Z

As I blogged yesterday, I'm planning to build a simplified personal publishing system to host this blog, partially to get around resource consumption issues with the current platform and partially to get some exercise with a new language or two. I thought about Smalltalk, Erlang, and Io, but Haskell gets the initial nod if for no other reason than it's a third side of the coin that Ruby and Java are two sides of — rigorously defined, "purely" functional, lazy, "typeful" and compiles to native code via GHC. (And, of course, the syntax warms the cockles of my mathematician's heart.) Like Ruby with gems, the GHC runtime also has excellent modularity, with a minimal and standard core and good package management via Cabal. (Hello? Java?)

The first question is how to integrate an application written in Haskell into a web container, preferably a web server like lightTPD or Apache via FastCGI. (CGI would be a consideration, too, but that's just too retro for me.) Thankfully, as of the forthcoming 6.6 version, GHC has good CGI support via the Network.CGI module, and Björn Bringert has a FastCGI binding that built on the GHC 6.5 tip with only a little tinkering. (I wanted to use the core Network.CGI module in place of Björn's cgi-compat module.)

A "Hello, World" implementation using the FastCGI binding and then compiled to native code performed well on a basic smoke benchmark. Here's the relevant line from top for an instance of the handler:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  234 hello.fcgi   0.0%  0:06.83   1    13    21   692K  1.63M  1.69M  29.0M
[...]

Benchmarking with ab shows that 5 handlers can happily crank through around 4000 requests/second with 99% of the requests requiring <2ms.

For comparison purposes and with an identical FastCGI configuration, the simplest possible Ruby on Rails "Hello, World" implementation (create test controller, edit the .rhtml to return content, wire-up FastCGI) consumes considerably more memory:

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
[...]
  537 ruby1.8     12.1%  0:26.49   1    14    94  22.5M  3.35M  24.5M  54.5M
[...]

and only manages around 100 requests/second with ~50ms response time for the 50th percentile and ~400ms at the 99th percentile. (I recognize that I should probably put a sic. after the "only", since 100 requests/second is significantly in excess of the peak throughput that my blog sees on a good day.)

This is far from apples-to-apples, as the RoR version is doing a lot more work under the covers, but it does give me the expectation that I can probably get a Haskell blog implementation that will have a memory footprint smaller than a base irb and provide Slashdottable performance.

Next up, deciding on how to store/represent an entry and how to implement Atom for syndication.

(comment bubbles) 2 comments

Feedburnerizing Typo, Part II

Paul Brown @ 2006-07-03T19:49:34Z

Last year, I wrote a rudimentary sidebar to display Feedburner feed links in Typo, but I didn't really get it to the point I wanted at the time. So, I took another fifteen minutes to rewrite the sidebar to work with the enhanced API, ditch the auto-subscribe chiclets, add links for category feeds, and muck with routes.rb. In routes.rb, I mapped a new set of feed URLs for Feedburner onto the controller that currently serves feeds, switched the existing mappings to a two-line controller that 301's to the Feedburner equivalents, and left holes so that people can subscribe directly to article-specific or tag-specific feeds if they wish. (The bonus in this approach is that autodiscovery gets taken care of for free, because the autodiscovery feed is one that gets 301'd.)

Just for grins, here's the two-line controller implementation:

class FbController < ContentController
  def redirect   
    headers["Status"] = "301 Moved Permanently"
    redirect_to "http://feeds.feedburner.com/Multifarious" +
      params[:type].to_s.capitalize + params[:id].to_s.capitalize
  end
end

Sometimes I think that the cornucopia of methods on some of the Ruby core classes (like capitalize on String) is overkill, and sometimes, it's convenient.

I hope that the enhanced setup is useful to any readers (since Feedburner should ensure QoS), but mostly I hope that it's transparent. (FWIW, NetNewsWire did the Right Thing and changed the feed URL for my self-subscription to the new one in response to the 301.) If for some reason you can't see this, let me know...

(comment bubbles) 0 comments

More on Meeting-Making for Google Calendar

Paul Brown @ 2006-05-06T05:55:00Z

After having posted about how it would be possible to take the Atom feeds from Google Calendar and make a collaborative appointment scheduler (meeting time picker for multiple people), I decided to give it a shot using the Atom parsing library for Ruby from Martin Traverso and Brian McCallister.

The Atom library is slick, and doing some simple extensions to the basic binding to support the Google Data elements is straightforward. For example, here's a Ruby snippet that will read the start time, end time, and reminder settings from the feed:

require 'atom'
require 'xmlmapping'
require 'time'
require 'date'
require 'net/http'
require 'uri'

module GoogleData

  NAMESPACE = 'http://schemas.google.com/g/2005'
  
  def GoogleData.int_or_nil(s)
    if s.nil?
      nil
    else
      s.to_i
    end 
  end
  
  def GoogleData.date_or_datetime(s)
    if s.length == 10
      Date.parse(s)
    else
      Time.iso8601(s)
    end  
  end
  
  class Reminder
    include XMLMapping
    
    namespace NAMESPACE
    
    has_attribute :absolute_time, :name => 'absoluteTime',
      :transform => lambda { |t| Time.iso8601(t) }
    has_attribute :days, :name => 'days',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
    has_attribute :hours, :name => 'hours',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
    has_attribute :minutes, :name => 'minutes',
      :transform => lambda { |s| GoogleData.int_or_nil(s) }
  end
  
  class When 
    include XMLMapping

    namespace NAMESPACE
    
    # The following little hack is required because the
    # datatype switches between xs:date for all-day
    # appointments and xs:dateTime for non-all-day
    # appoinments.

    has_attribute :start_time, :name => 'startTime',
      :transform => lambda { |s| GoogleData.date_or_datetime(s) }
    has_attribute :end_time, :name => 'endTime',
      :transform => lambda { |s| GoogleData.date_or_datetime(s) }
    has_attribute :valueString
    
    has_many :reminders, :name => 'reminder', :type => Reminder
  end

  class Entry < Atom::Entry
    namespace NAMESPACE
    has_one :when, :name => 'when', :type => When
  end
  
  class Feed < Atom::Feed
    has_many :entries, :name => 'entry', :type => Entry
  end

The two key tricks above are extending Atom::Feed and Atom::Entry to add explicit handling for the extension elements that we're after. (Without any changes, Atom::Entry does capture an array of extension elements, but I'd prefer to work with objects.) Similar approaches can be applied to the other "kinds" of things in the feed. As an editorial comment, I'm lukewarm about the datatype of an attribute value determining its semantics; normally the semantics would determine the datatype.

To grab the data from the Atom feed of the calendar:

response = Net::HTTP.get_response(URI.parse(GCAL_FULL_URL))

# TODO: Limit the number of redirects to follow.
# TODO: Gracefully handle other non-200's here, too.
while response.kind_of? Net::HTTPRedirection
  response = Net::HTTP.get_response(URI.parse(response['location']))
end

feed = GoogleData::Feed.new(response.body)

feed.entries.each { |event|
  puts '---'
  puts event.title
  puts event.when.start_time.to_s + ' -- ' + event.when.end_time.to_s
}

Back to the original goal of building a "meeting maker" for Google Calendar based on the Atom feeds for participants' calendars, the additional work to properly handle recurrence and recurrenceException makes the problem look quite a bit more complicated (and interesting). (Fortunately, there does appear to be an iCalendar (RFC2445) library available as well...) So this is turning into more than a one-evening project.

With the added complexity of supporting recurring events and exceptions, there is probably a tidy approach that augments the list merge I suggested before with generators and sequence comprehensions for the recurring events — just enumerate possible meeting times from the complement of the merged list of "busy" times for non-recurring meetings and test for overlaps in the union (i.e., "or") of the sequences for each participant. (If I recall correctly, the meeting makers in the usual Exchange clients don't support optimal scheduling of recurring meetings, so that would be a nice feature as well, i.e., schedule the recurring meeting at the time with the fewest conflicts or at least minimize the conflicts for some subset of the participants.)

(comment bubbles) 0 comments

Some Homebrew Sauce for Google Calendar

Paul Brown @ 2006-04-15T03:50:00Z

As I looked at some cool ideas from Elias Torrez (via James Snell), it occurred to me that I could make some homebrew sauce for Google Calendar to address one of my wants, namely a meeting time chooser for one or more participants. Here's how it could work.

In the Atom feed for the calendar, there are elements in a Google namespace like so:

<feed xmlns="http://www.w3.org/2005/Atom"
      xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
      xmlns:gd="http://schemas.google.com/g/2005">
  [...]
  <entry>
    [...]
    <gd:when startTime="2006-04-03T16:00:00.000Z"
             endTime="2006-04-03T17:00:00.000Z"/>
  </entry>
</feed>

We'd want a query that accepted a tuple of URLs for Atom calendar feeds and performed an iterative merge:

  1. Copy the <gd:when> elements from the first feed on the list into working storage of some kind. We'll write a @startTime and @endTime pair as (s,e) in what follows.
  2. Iterate through the elements of the next feed in the list; suppose that the current one is (x,y). If there is an element (w,z) in the scratch list such that z<x and w<y, replace (x,y) by (min(x,w),max(y,z)). Otherwise, add (x,y) to the scratch list.
  3. Repeat #2 with each feed.

As additional sugar, one additional feed could be used to represent desired days or time ranges by exclusion. The combined feed would contain the collective busy times for the group, and the publicly visible Atom feeds for each calendar would be all that would be needed.

This could be done in a browser with a script written in E4X, with the caveat of having to perform date arithmetic. (The XML Schema variant of ISO 8601 date-times compare cleanly as strings, but the JavaScript Date object is based on a different syntax.) XQuery supports operations and comparisons on dateTime and duration values, so it would be another good candidate. As would Ruby, thanks to Atom support and date support in a compatible format.

(comment bubbles) 0 comments