Herein post 4 of n on my hobby project to rewrite my
personal publishing software in Haskell. Herein, I create a
economically-driven approach to Atom syndication format for entries
and comments. By citing economics as the prime motivator, I
mean that I'm aiming neither for the most complete nor the most
elegant implementation except where those two overlap with
exigency.
Required Reading
To make sure that I knew enough about Atom, I read through the Atom Syndication Format RFC (I prefer the plain text to the pretty version), the introduction at AtomEnabled.org, and Mark Pilgrim's note on How to make a good ID in Atom. After reading so many specifications that either use flabby XML Schema (like WSDL) or ad hoc XML (like RSS 2.0), the use of RELAX NG compact syntax in the RFC was a breath of fresh air.
Really Simple Atom Data Model
The first set of design decisions I made were what to throw out
from Atom. I decided to omit the atom:contributor
construct entirely as well as atom:author is sufficient
for my purposes, and atom:source and attributes on
atom:link other than @rel and
@href weren't going to be any use to me, either. I
decided to omit the @scheme and @label
attributes on atom:category since all of my
@term values will be human readable and don't plan on
using any scheme other than my own. I decided to omit any specific
constraints on components that might otherwise be URIs, RFC3339-formatted
date/time, or other — String will do for now, and
I'll make sure that properly formatted data (including escaping as
necessary) is used in the first place. I also decided to leave any of
the sequencing and quantity constraints out of the model, as this will
be an internal model only.
Here's the way it looks, and if you squint at it just right, it doesn't look that different from the RELAX NG compact schema:
data AtomElement = Feed [AtomElement]
| Entry [AtomElement]
| Content AtomContent
| Author { author_name :: String,
author_uri :: Maybe String,
author_email :: Maybe String }
| Category String
| Generator { gen_name :: String,
gen_uri :: String,
gen_version :: String }
| Id String
| Icon String
| Link { rel :: String,
href :: String }
| Logo String
| Published String
| Rights AtomContent
| Subtitle AtomContent
| Summary AtomContent
| Title AtomContent
| Updated String
deriving (Show)
The Maybe String for the author_uri and
author_email components of the Author
representation are intended to allow for comments where the author may
omit an email address or link. (Of course, I may just omit their
comment under those circumstances...) Next, one more type for
AtomContent, where I elected to eliminate the possibility
of HTML content:
data ContentType = XHTML | TEXT
deriving (Eq,Show,Enum)
data AtomContent = AtomContent { contentType :: ContentType,
body :: String }
deriving (Show)
XML Output
With a few (non-limiting) assumptions, getting XML out is simple.
First up, the Atom URI, my choice to bind it to the atom
prefix and my assumption that XHTML will always in the default
namespace:
_prefix :: String _prefix = "atom" _uri :: String _uri = "http://www.w3.org/2005/Atom" start_div :: String start_div = "<div xmlns=\"http://www.w3.org/1999/xhtml\">" end_div :: String end_div = "</div>"
It's worth noting that the XHTML specification (via either the transitional DTD or the strict DTD) requires that the XHTML namespace be the default namespace, but there is no requirement that an XHTML fragment in an Atom feed use the default namespace.
Next, some really simple functions to wrap content in elements:
-- Format a clopen element with a list of attributes.
clopen :: String -> [(String,String)] -> String
clopen s [] = "<" ++ (prefix s) ++ "/>"
clopen s xs = "<" ++ (prefix s) ++ (nv_to_s "" xs) ++ "/>"
-- Wrap a string in an element.
wrap :: String -> String -> String
wrap s t = "<" ++ (prefix s) ++ ">" ++ t ++ "</" ++ (prefix s) ++ ">"
-- If a value is present (i.e., not Nothing), wrap it in an element.
wrap_m :: String -> Maybe String -> String
wrap_m _ Nothing = ""
wrap_m s (Just t) = wrap s t
-- Wrap an element with attributes around a string.
wrap_ :: String -> [(String,String)] -> String -> String
wrap_ s [] t = wrap s t
wrap_ s xs t = "<" ++ (prefix s) ++ (nv_to_s "" xs) ++ ('>':t)
++ "</" ++ (prefix s) ++ ">"
wrap_ns :: String -> String -> String
wrap_ns s t = wrap_ s [(_prefix,_uri)] t
-- Format a list of name-value pairs as attributes.
nv_to_s :: String -> [(String,String)] -> String
nv_to_s = foldl att
att :: String -> (String,String) -> String
att s (n,v) = s ++ (' ':(n ++ "=\"" ++ v ++ "\""))
And then just map the various shades of AtomElement
onto the functions:
toXml :: AtomElement -> String
toXml (Feed xs) = wrap_ns "feed" (content_ xs)
toXml (Entry xs) = wrap_ns "entry" (content_ xs)
toXml' :: AtomElement -> String
toXml' (Entry xs) = wrap "entry" (content_ xs)
toXml' (Category s) = clopen "category" [("term",s)]
toXml' (Id s) = wrap "id" s
toXml' (Icon s) = wrap "icon" s
toXml' (Link r h) = clopen "link" [("rel",r),("href",h)]
toXml' (Logo s) = wrap "logo" s
toXml' (Published s) = wrap "published" s
toXml' (Updated s) = wrap "updated" s
toXml' (Author s u e) = wrap "author" ((wrap "name" s)
++ (wrap_m "uri" u)
++ (wrap_m "email" e))
toXml' (Generator n u v) = wrap_ "generator" [("uri",u),("version",v)] n
toXml' (Content a) = atom_text "content" a
toXml' (Rights a) = atom_text "rights" a
toXml' (Subtitle a) = atom_text "subtitle" a
toXml' (Summary a) = atom_text "summary" a
toXml' (Title a) = atom_text "title" a
content_ :: [AtomElement] -> String
content_ = concat.(map toXml')
-- Render an Atom text construct as XML.
atom_text :: String -> AtomContent -> String
atom_text s (AtomContent XHTML t) = wrap_ s [("type","xhtml")] (start_div ++ t ++ end_div)
atom_text s (AtomContent TEXT t) = wrap s t
(The Atom spec allows @type="text" to be omitted.)
The toXml function and the AtomElement,
AtomContent, and ContentType types are all
that would be exported from the module.
A quick check with ghci shows that this does the right
thing:
[...] *Text.Atom> let entry = Entry [Title (AtomContent TEXT "Atom-Powered Robots Run Amok"), Id "urn:uuid:foo", Updated "2003-12-12T18:30Z", Author John Doe" Nothing Nothing, Content (AtomContent XHTML "Some text.
")] *Text.Atom> toXml entry "<atom:entry atom=\"http://www.w3.org/2005/Atom\"><atom:title type=\"text\">Atom-Powered Robots Run Amok</atom:title><atom:id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</atom:id><atom:updated>2003-12-12T18:30Z</atom:updated><atom:author><atom:name>John Doe</atom:name></atom:author><atom:content type=\"xhtml\"><div xmlns=\"http://www.w3.org/1999/xhtml\"><p>Some text.</p></div></atom:content></atom:entry>"
The let entry=... line makes more sense with some
whitespace thrown in:
let entry = Entry [ Title (AtomContent TEXT "Atom-Powered Robots Run Amok"), Id "urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a", Updated "2003-12-12T18:30Z", Author "John Doe" Nothing Nothing, Content (AtomContent XHTML "<p>Some text.</p>") ]
Other Available XML Wheels
While the above is a small wheel, it is a wheel nonetheless, and I looked at three Haskell XML libraries before reinventing it:
- Haskell XML Toolbox, a.k.a., HXT, (link) appears to be
under active development and supports my basic requirements of XML
output and namespace
support. The API looks agreeable, and there is RSS aggregator in 50
lines as an example. If I choose to implement the Atom
Publishing Protocol, HXT is the way I'll go to get
atom:entryturned into the right kind of internal structure. - HaXml (link) appears to lack namespace support, so I dismissed it without looking deeply at it.
- HXML (link) lacks namespace support, so I dismissed it without looking deeply at it. That said, the validation concept in HXML has the same heritage as the one used in the RELAX NG validator Jing.
What's Left?
There's enough real work left for at least three more blog entries: storage/state management for entries (probably STM with persistence via the filesystem), a commenting facility, human-facing content display and navigation (probably Haskell via the Text.XHtml package), and making sure that the FCGI wrapper works well multi-threaded. (I want a multi-threaded FCGI handler so that STM can serve as the concurrency control for the application; otherwise, the persistence layer will need to provide that functionality.)
State's the next one I'll tackle.












Comment from Chris Eidhof @ 2007-02-27T10:53:03Z # permalink
Comment from Paul Brown @ 2007-02-27T12:41:12Z # permalink