Really Simple Atom Syndication

Paul Brown @ 2007-02-27T04:06:00Z

Herein post 4 of n on my hobby project to rewrite my personal publishing software in Haskell. Herein, I create a economically-driven approach to Atom syndication format for entries and comments. By citing economics as the prime motivator, I mean that I'm aiming neither for the most complete nor the most elegant implementation except where those two overlap with exigency.

Required Reading

To make sure that I knew enough about Atom, I read through the Atom Syndication Format RFC (I prefer the plain text to the pretty version), the introduction at AtomEnabled.org, and Mark Pilgrim's note on How to make a good ID in Atom. After reading so many specifications that either use flabby XML Schema (like WSDL) or ad hoc XML (like RSS 2.0), the use of RELAX NG compact syntax in the RFC was a breath of fresh air.

Really Simple Atom Data Model

The first set of design decisions I made were what to throw out from Atom. I decided to omit the atom:contributor construct entirely as well as atom:author is sufficient for my purposes, and atom:source and attributes on atom:link other than @rel and @href weren't going to be any use to me, either. I decided to omit the @scheme and @label attributes on atom:category since all of my @term values will be human readable and don't plan on using any scheme other than my own. I decided to omit any specific constraints on components that might otherwise be URIs, RFC3339-formatted date/time, or other — String will do for now, and I'll make sure that properly formatted data (including escaping as necessary) is used in the first place. I also decided to leave any of the sequencing and quantity constraints out of the model, as this will be an internal model only.

Here's the way it looks, and if you squint at it just right, it doesn't look that different from the RELAX NG compact schema:

data AtomElement = Feed [AtomElement]
                 | Entry [AtomElement]
                 | Content AtomContent
                 | Author { author_name :: String,
                            author_uri :: Maybe String,
                            author_email :: Maybe String }
                 | Category String 
                 | Generator { gen_name :: String,
                               gen_uri :: String,
                               gen_version :: String }
                 | Id String
                 | Icon String
                 | Link { rel :: String,
                          href :: String }
                 | Logo String
                 | Published String
                 | Rights AtomContent
                 | Subtitle AtomContent
                 | Summary AtomContent
                 | Title AtomContent
                 | Updated String
                   deriving (Show)

The Maybe String for the author_uri and author_email components of the Author representation are intended to allow for comments where the author may omit an email address or link. (Of course, I may just omit their comment under those circumstances...) Next, one more type for AtomContent, where I elected to eliminate the possibility of HTML content:

data ContentType = XHTML | TEXT
                 deriving (Eq,Show,Enum)

data AtomContent = AtomContent { contentType :: ContentType,
                                 body :: String }
                 deriving (Show)

XML Output

With a few (non-limiting) assumptions, getting XML out is simple. First up, the Atom URI, my choice to bind it to the atom prefix and my assumption that XHTML will always in the default namespace:

_prefix :: String
_prefix = "atom"

_uri :: String
_uri = "http://www.w3.org/2005/Atom"

start_div :: String
start_div = "<div xmlns=\"http://www.w3.org/1999/xhtml\">"

end_div :: String
end_div =  "</div>"

It's worth noting that the XHTML specification (via either the transitional DTD or the strict DTD) requires that the XHTML namespace be the default namespace, but there is no requirement that an XHTML fragment in an Atom feed use the default namespace.

Next, some really simple functions to wrap content in elements:

-- Format a clopen element with a list of attributes.
clopen :: String -> [(String,String)] -> String
clopen s [] = "<" ++ (prefix s) ++ "/>"
clopen s xs = "<" ++ (prefix s) ++ (nv_to_s "" xs) ++ "/>"

-- Wrap a string in an element.
wrap :: String -> String -> String
wrap s t = "<" ++ (prefix s) ++ ">" ++ t ++ "</" ++ (prefix s) ++ ">"

-- If a value is present (i.e., not Nothing), wrap it in an element.
wrap_m :: String -> Maybe String -> String
wrap_m _ Nothing = ""
wrap_m s (Just t) = wrap s t

-- Wrap an element with attributes around a string.
wrap_ :: String -> [(String,String)] -> String -> String
wrap_ s [] t = wrap s t
wrap_ s xs t = "<" ++ (prefix s) ++ (nv_to_s "" xs) ++ ('>':t)
               ++ "</" ++ (prefix s) ++ ">"

wrap_ns :: String -> String -> String
wrap_ns s t = wrap_ s [(_prefix,_uri)] t

-- Format a list of name-value pairs as attributes.
nv_to_s :: String -> [(String,String)] -> String
nv_to_s = foldl att

att :: String -> (String,String) -> String
att s (n,v) = s ++ (' ':(n ++ "=\"" ++ v ++ "\""))

And then just map the various shades of AtomElement onto the functions:

toXml :: AtomElement -> String
toXml (Feed xs) = wrap_ns "feed" (content_ xs)
toXml (Entry xs) = wrap_ns "entry" (content_ xs)

toXml' :: AtomElement -> String
toXml' (Entry xs) = wrap "entry" (content_ xs)
toXml' (Category s) = clopen "category" [("term",s)]
toXml' (Id s) = wrap "id" s
toXml' (Icon s) = wrap "icon" s
toXml' (Link r h) = clopen "link" [("rel",r),("href",h)]
toXml' (Logo s) = wrap "logo" s
toXml' (Published s) = wrap "published" s
toXml' (Updated s) = wrap "updated" s
toXml' (Author s u e) = wrap "author" ((wrap "name" s)
                                       ++ (wrap_m "uri" u)
                                       ++ (wrap_m "email" e))
toXml' (Generator n u v) = wrap_ "generator" [("uri",u),("version",v)] n
toXml' (Content a) = atom_text "content" a
toXml' (Rights a) = atom_text "rights" a
toXml' (Subtitle a) = atom_text "subtitle" a
toXml' (Summary a) = atom_text "summary" a
toXml' (Title a) = atom_text "title" a

content_ :: [AtomElement] -> String
content_ = concat.(map toXml')

-- Render an Atom text construct as XML.
atom_text :: String -> AtomContent -> String
atom_text s (AtomContent XHTML t) = wrap_ s [("type","xhtml")] (start_div ++ t ++ end_div)
atom_text s (AtomContent TEXT t) = wrap s t

(The Atom spec allows @type="text" to be omitted.) The toXml function and the AtomElement, AtomContent, and ContentType types are all that would be exported from the module.

A quick check with ghci shows that this does the right thing:

[...]
*Text.Atom> let entry = Entry [Title (AtomContent TEXT "Atom-Powered Robots Run Amok"), Id "urn:uuid:foo", Updated "2003-12-12T18:30Z", Author John Doe" Nothing Nothing, Content (AtomContent XHTML "

Some text.

")] *Text.Atom> toXml entry "<atom:entry atom=\"http://www.w3.org/2005/Atom\"><atom:title type=\"text\">Atom-Powered Robots Run Amok</atom:title><atom:id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</atom:id><atom:updated>2003-12-12T18:30Z</atom:updated><atom:author><atom:name>John Doe</atom:name></atom:author><atom:content type=\"xhtml\"><div xmlns=\"http://www.w3.org/1999/xhtml\"><p>Some text.</p></div></atom:content></atom:entry>"

The let entry=... line makes more sense with some whitespace thrown in:

let entry = Entry [
  Title (AtomContent TEXT "Atom-Powered Robots Run Amok"),
  Id "urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a",
  Updated "2003-12-12T18:30Z",
  Author "John Doe" Nothing Nothing,
  Content (AtomContent XHTML "<p>Some text.</p>")
]

Other Available XML Wheels

While the above is a small wheel, it is a wheel nonetheless, and I looked at three Haskell XML libraries before reinventing it:

  • Haskell XML Toolbox, a.k.a., HXT, (link) appears to be under active development and supports my basic requirements of XML output and namespace support. The API looks agreeable, and there is RSS aggregator in 50 lines as an example. If I choose to implement the Atom Publishing Protocol, HXT is the way I'll go to get atom:entry turned into the right kind of internal structure.
  • HaXml (link) appears to lack namespace support, so I dismissed it without looking deeply at it.
  • HXML (link) lacks namespace support, so I dismissed it without looking deeply at it. That said, the validation concept in HXML has the same heritage as the one used in the RELAX NG validator Jing.

What's Left?

There's enough real work left for at least three more blog entries: storage/state management for entries (probably STM with persistence via the filesystem), a commenting facility, human-facing content display and navigation (probably Haskell via the Text.XHtml package), and making sure that the FCGI wrapper works well multi-threaded. (I want a multi-threaded FCGI handler so that STM can serve as the concurrency control for the application; otherwise, the persistence layer will need to provide that functionality.)

State's the next one I'll tackle.

Meta

Tags: (tag) (tag) (tag)

(comment bubbles) 2 comments
1446 direct views

Comment from Chris Eidhof @ 2007-02-27T10:53:03Z # permalink

I really like these series, very interesting! I'm currently trying to find some good XML processing tools. So far, I also dismissed HaXml and HXml. HXT looks promising indeed, although Arrows still scare me a bit... ;) By the way: you could also consider plugging your system in to HAppS.

Comment from Paul Brown @ 2007-02-27T12:41:12Z # permalink

@Chris: HAppS looks interesting, and I've been experimenting with it on and off. The upcoming changes to support deployment on Amazon EC2 with S3 as a backing store look particularly shiny... Even so, my goal is to have the experience of building the system, so using HAppS would be cheating.