I've added some new features to perpubplat, and each one
presented a nice exercise in Haskell, working with Haskell libraries,
and the design and consumption of web APIs.
Collage of Random Flickr Photos
The first feature is the collage of photos that uses the Flickr JSON API. The
collage appears at the bottom of the sidebar under the "Photos"
heading.
The implementation of the collage
(Blog.Widgets.FlickrCollage; source here)
uses a polite (i.e., supports conditional GET) HTTP poller
(Blog.BackEnd.HttpPoller; source here)
to call flickr.people.getPublicPhotos (docs here)
every fifteen minutes and pull down the data for my most recent 500
photos. (I'll discuss the HTTP poller below.) To deal with concurrency
— many readers (HTTP requests) and one writer (the polling
thread) — an MVar holds the list of photos, with
the writer taking the old value and putting the new and the reader
taking the old value and then putting it right back. The
implementation of MVar ensures that waiters are awakened
in FIFO order, so this should (and does) work great.
The JSON
parser that I've been using uses Haskell's datatype polymorphism
to model polymorphism in JSON, and this means that you work with
wrapped (JSON Array wrapped around a list, JSON
String wrapped around a Haskell String,
etc.) primitive values instead of primitive values. To make things a
little more ergonomic, I've bundled up some one-line utility functions
in Blog.Widgets.JsonUtilities (source here).
My favorite of the bunch is </>:
(</>) :: J.Value -> String -> J.Value
(J.Object o) </> s = o M.! s
(J.Array a) </> s = J.Array $ map (flip (</>) $ s) a
This makes it possible to compactly express access to nested JSON
objects. For example, from the Flickr integration:
to_photo :: J.Value -> FlickrPhoto
to_photo m = FlickrPhoto { photo_id = uns $ m </> "id"
, owner = uns $ m </> "owner"
, secret = uns $ m </> "secret"
, server = uns $ m </> "server"
, photo_title = uns $ m </> "title"
, farm = unn $ m </> "farm" }
The uns function pulls the value out of a wrapped JSON
String, and the unn function pulls the value
out of a wrapped JSON Number. With a bit more thought,
someone could probably come up with a nice library for JSON handling
along the lines of Jaql or something
like Pig Latin.
HTTP Polling
My rough cut at an HTTP polling library built on top of
Network.HTTP is Blog.BackEnd.HttpPoller (source here),
and it supports the bare minimum of features that I needed:
- Call a supplied function with signature
String -> IO () with the body of a 200 response and ignore others.
- Use "conditional GET" (RFC
2616, section 9.3) via
ETag/If-None-Match
and Last-Modified/If-Modified-Since.
- Support for basic authentication via a header configured on the
template request passed to the poller.
- Tolerant of temporary failures but able to gracefully exit.
- Detailed-enough logging in case APIs, endpoints, or policies
change. (I omitted redirect support on purpose.)
del.icio.us Bookmarks on an Entry
The second feature is integration with del.icio.us bookmarks pointing to an
entry via the del.icio.us
JSON API, and it shows up as a trailer on entries in the detail
view:

I've already blogged about most of the interesting stuff from
integrating with the del.icio.us JSON API using
Network.HTTP; see Haskell,
del.icio.us, and JSON (encodings and non-standard JSON) and A
Short Adventure with simpleHTTP (unclosed
sockets).
The part I didn't cover was how to schedule queries against
del.icio.us, and I'll probably go back to both simplify and enhance
it. As present, it's a bit convoluted; three threads interact as
follows:
- The driver triggers the scheduler on a fixed interval.
- The scheduler manages an ordered list of scheduled times
and entries. In response to a trigger from the driver, if
the head of the list is past due, the scheduler pops the head
of the list, refreshes the data about bookmarks for that entry, sends
it to the controller, and schedules the next refresh for that
entry based on its age in days. The scheduler also receives
information about new entries and adds them to the schedule.
- The controller manages a
Data.Map of data
about bookmarks per entry and either updates data in response to the
scheduler or returns the current data for rendering a
response.
The current design is in-memory only, so it gets repopulated each
time the service is booted. I intend to add simple file-based
persistence along the same lines used for entries and comments. The
other major missing features are support for conditional GET and
grouping requests into groups of 15, as allowed by the del.icio.us
API.
I would have liked to use the delicious API, but
Network.HTTP doesn't currently support HTTPS.
Personal Aggregation
The third
feature is aggregation of my del.icio.us bookmarks (via RSS feed), Google Reader shared
items (via Atom feed), and Twitter "tweets" (via JSON
API). The aggregated flotsam, jetsam, dross, and detritus shows
up in the sidebar under the "Stream of Consciousness" heading in the
sidebar.
The feature is a bit like Moveable Type's Action
Streams plugin, but the perpubplat implementation benefits from
the fact that a Haskell FastCGI application can have background
threads (so no crontab hacking).
The implementation is in the
Blog.Widgets.StreamOfConsciousness.* modules:
Thought is a data structure that represents a tweet, post, shared item, etc. — date, link, content.
Twitter, GoogleReader, and
DeliciousPosts encapsulate access to the respective
services and parsing data into lists of Thoughts. Each
worker uses an HTTP poller (same as with the Flickr collage) to poll a
feed.
Controller manages the aggregate data structure and a
pre-rendered HTML fragment.
To handle the multiple writers and multiple readers, I implemented
a lightweight version of multi-version
concurrency control where readers can always get data but writers
may have to repeat a computation if someone else updated the data in
the meantime. Here's a fragment from
B.W.S.Controller (full
source here):
commit :: SoCController -> [Thought] -> IO ()
commit socc new_items =
do { snap <- get_data socc
; let items' = take (max_size snap) $ merge new_items $ items snap
; let rendered' = thoughts_to_xhtml items'
; let snap' = snap { items = items'
, rendered = rendered' }
; ok <- update socc snap'
; if ok then
return ()
else
do { threadDelay collision_delay
; commit socc new_items }
}
loop :: Chan SoCRequest -> Snapshot -> IO ()
loop ch snap =
do { req <- readChan ch
; snap' <- case req of
GetHtmlFragment c ->
do { putMVar c $ rendered snap
; return snap }
GetData h ->
do { putMVar h snap
; return snap }
Update ok snap'' ->
if (version snap) == (version snap'') then
do { putMVar ok True
; let snap' = snap'' { version = (version snap) + 1 }
; return snap' }
else
do { putMVar ok False
; return snap }
; loop ch snap' }
The commit function runs in the HTTP polling thread
doing the updating, and it's responsible both for merging the items
into the sorted data and for updating the HTML representation that
will get handed to the page rendering process.
The other interesting nut to crack was extracting data from XML
using Haskell. I could have used the del.icio.us JSON feed and the
JSON feed that the Google Reader shared items Javascript widget uses,
but those lack the timestamps that I need to fold the streams
together.
Extracting Data from RSS and Atom
I followed the standard trail for learning HXT,
which involves building from source,
reading the gentle
introduction, and trying some of the practical
examples. The only issue I had was with namespace handling.
Here's a code fragment from B.W.S.DeliciousPosts (source here) to read the RSS feed of my del.icio.us bookmarks:
import Text.XML.HXT.Arrow
handle_posts :: SoCController -> String -> IO ()
handle_posts socc body = do { posts <- runX ( readString parse_opts body >>> getItems )
; commit socc posts }
parse_opts = [(a_validate, v_0), (a_check_namespaces,v_1)]
atElemQName qn = deep (isElem >>> hasQName qn)
text = getChildren >>> getText
textOf qn = atElemQName qn >>> text
rdf_uri = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
rdf_RDF = QN "rdf" "RDF" rdf_uri
rss_uri = "http://purl.org/rss/1.0/"
rss_item = QN "rss" "item" rss_uri
rss_title = QN "rss" "title" rss_uri
rss_link = QN "rss" "link" rss_uri
dc_uri = "http://purl.org/dc/elements/1.1/"
dc_date = QN "dc" "date" dc_uri
getItem = atElemQName rss_item >>>
proc i -> do
t <- textOf rss_title -< i
u <- textOf rss_link -< i
d <- textOf dc_date -< i
returnA -< Thought Delicious d u t
getItems = atElemQName rdf_RDF >>>
proc r -> do
items <- getItem -< r
returnA -< items
HXT uses arrow
notation; the quick and dirty explanation is that proc is
like λ (but for arrows instead of functions), the
<- is the usual monadic "bind" operator, and the
-< feeds a value to the expression on the shaft of the
arrow.
The first time I ran this against the RSS from del.icio.us, I got
nothing back, so after looking at the XML for the RSS, I switched the
prefix for the RSS QNames to the empty string to match the input file,
and it worked. Grrr... That means that the (==) for
QName is broken, and a quick look at the source in
Text.XML.HXT.DOM.TypeDefs showed why:
data QName = QN { namePrefix :: String
ualified name \"namePrefix:localPart\"
, localPart :: String
ed name \"namePrefix:localPart\"
, namespaceUri :: String
i
}
deriving (Eq, Ord, Show, Read, Typeable)
The derived (==) will just and together the
(==) for the three components (prefix, local, uri), but
XML QNames are
equal if their
local parts and URIs (as
strings) are equal. It's easy to fix by dropping the derivation
of Eq and supplying a good version:
- deriving (Eq, Ord, Show, Read, Typeable)
+ deriving (Ord, Show, Read, Typeable)
+
+ instance Eq QName where
+ q1 == q2 = ((localPart q1) == (localPart q2))
+ && ((namespaceUri q1) == (namespaceUri q2))
After which, it works according to my expectations for namespace
handling.
Couldn't You Do All That With JavaScript...?
Yes. I could. I didn't. Here are a few of the reasons that I chose not to:
- My experiments showed that page loads would be several seconds
instead of a fraction of a second. Other people
have had the same experience. (It reminds me of the opening scene of
I'm Gonna Git You
Sucka where Junebug dies of an OG.
Don't let your blog die of an OW...)
- Some of the widgets are just plain fugly, IMHO. I'm looking at you,
Google Reader shared item "clip" and Twitter Flash widget, although
the availability of JSON for the Google Reader shared item "clip"
(look in the JavaScript) and Twitter would allow me to come up with
something more pleasing (to me).
- Even though it's not a good idea — e.g., IE7 is broken,
Firefox <3 doesn't
do incremental display, etc. — I would like to be able to serve
application/xhtml+xml,
and document.write doesn't
work.
- The availability of background threads on the server side means
that Javascript on the client side isn't the only option.
Other Integrations and Aggregations
The other two features that I'd like to add are backlinks to other
blogs and backlinks to posts on community sites like Reddit and DZone. (I'm on the fence about
implementing trackback
support; you could twist my arm.)
Nonetheless, I'm on the fence about directing people to comment
threads in other locations, i.e., Reddit. (My reasons are similar to
Reg
Braithwaite's.) It would be a simple matter to sniff referring
URLs, deduce where an entry is posted on Reddit, and then integrate
the comments together, but Reddit's draconian User Agreement forbids
it:
The content, organization, graphics, text, images,
video, design, compilation, advertising and all other material on the
Website, including without limitation, the "look and feel" of this
website, are protected under applicable copyrights and other
proprietary (including but not limited to intellectual property)
rights and are the property of Website Provider or its licensors. The
copying, rearrangement, redistribution, modification, use or
publication by you, directly or indirectly, of any such matters or any
part of the website, including but not limited to the removal or
alteration of advertising, except for the limited rights of use
granted hereunder, is strictly prohibited.
Someone should implement a community hub that integrates discussion
threads, followup posts, and blog comments on an original entry in a
transparent and open fashion...
Postmortem
My first observation from this experiment is that APIs are
preferable to feeds are preferable to widgets when it comes to
integration of services on the web. (Note that I didn't say web
serivces...) Even listing widgets is somewhat questionable in my
opinion, since it's more of a "put my stuff on your page" than a "use
my service".
My second observation is nothing new, but I now have experimental
evidence — JSON is preferable to XML, whether or not the target
client runs in a browser. If I were building a service, I'm not sure
that I'd bother with supporting an XML API.
My third observation is that I would use Haskell to build a
product or service, and I mean that in the sense that I can see how to
train a team and build processes (prototyping, implementation,
quality, deployment, support) around Haskell. The language does have
a relatively steep learning curve (q.q. Kevin Scaldefarri's post
on the subject and the comments that follow or Reg Braithwaite's general
ruminations on learning languages), but the real problem is
collectively getting through the challenges once. It reminds me of
learning spectral
sequences as a graduate student; fifteen minutes with my advisor
to work an example was better than a week of staring at otherwise
incrutable notation. As a measure of the view from my current
location on the learning curve, I coded up a working rough cut of the
"stream of consciousness" feature in an evening plus an afternoon cup of
coffee, and I wouldn't regard myself as being fully around the curve
yet (FFI, custom monads/transformers, etc. await).