I'd like to add both a sidebar with my bookmarks and some per-entry chrome for posts bookmarked on del.icio.us, but I don't want to use client-side Javascript to do it. The alternative is to pull, cache, and manage the data on the server side. As a prototype, I whipped up a simple Haskell program that uses the del.icio.us JSON APIs (for posts and for URLs), and it contained a couple of surprising detours.
Some Haskell
First up, some Haskell. After going shopping on Hackage, I
installed Network.HTTP,
Thomas DuBuisson's pureMD5
package, and the JSON package from Masahiro Sakai and Jun
Mukai (cabalized version is here). Like all
code that builds on a decent set of libraries, the Haskell code to hit
del.icio.us is straightforward; full source is here, so
I'll just post some fragments below to give a flavor of the code.
Create a structure to hold the data:
data DeliciousBookmark = DeliciousBookmark { bookmark_url :: String
, description :: String
, tags :: [String] }
deriving ( Show, Eq, Ord )
Build the request:
bookmarks_fragment :: String
bookmarks_fragment = "http://del.icio.us/feeds/json/"
request_for_bookmarks :: String -> Request
request_for_bookmarks user = Request ( fromJust . parseURI $
bookmarks_fragment ++ user ++ "?raw" )
GET [] ""
Send it:
fetch_bookmarks :: String -> IO [DeliciousBookmark]
fetch_bookmarks user = do { res <- simpleHTTP . request_for_bookmarks $ user
; case res of
Right (Response (2,0,0) _ _ body) ->
return $ process_bookmarks_body body
}
And then parse and walk through the response body:
parse_crufty_json :: String -> J.Value
parse_crufty_json = parse_json . unescape . utf8_decode
where
parse_json = \s -> case (parse J.json "" s) of
Left err -> error . show $ err
Right v -> v
process_bookmarks_body :: String -> [DeliciousBookmark]
process_bookmarks_body body =
case parse_crufty_json body of
J.Array a ->
map (process_bookmark . uno) a
process_bookmark :: M.Map String J.Value -> DeliciousBookmark
process_bookmark m =
DeliciousBookmark { bookmark_url = uns $ M.findWithDefault blank "u" m
, description = uns $ M.findWithDefault blank "d" m
, tags = map uns $ una $ M.findWithDefault empty_array "t" m }
blank = J.String ""
empty_array = J.Array []
uno (J.Object o) = o
uns (J.String s) = s
And that's all there is to it, except that — as might be expected from the parse_crufty_json function — there were a few things
that didn't work on the first pass.
Bytes and Characters
The first wrinkle I ran into with the simple del.icio.us client
occurred in process_bookmarks_body. The Haskell
String that comes from the HTTP response structure is
just a straight conversion of the response body from bytes to
character ordinals. This is all well and good if the body is encoded
in ISO-8859-1,
but it's fraught with peril otherwise. The del.icio.us service sends
back UTF-8 (and
ignores an Accept-Charset header
instead either returning a correctly encoded response or a
406 response code), so any interesting characters will
cause problems. In this case, what should be Solutoire.com
\8250 Plotr is coming through as Solutoire.com
\226\128\186 Plotr. Writing a decoder is no big deal and an
opportunity to play a quick round of golf.
In terms of making HTTP in Haskell better, there was apparently a
Google SoC project proposed
to integrate cURL via FFI and
Haskell's ByteString API, but it doesn't look like
anything's come of it.
RFC-compliant JSON versus Works For Me in JavaScript
The second wrinkle with the simple del.icio.us client is more pernicious. After I resolved the string encoding issues, I started getting errors of the form:
parse error at (line 1, column 1552): unexpected "'" expecting "\"", "\\", "/", "b", "f", "n", "r", "t" or "u"
And sure enough, on inspection, there's an escaped apostrophe lurking in the JSON. This probably wouldn't bother a client who simply evaluated the JSON as literal JavaScript (which seems to be the intent of the API), but it's not legal JSON and the parser correctly signals an error.
The JSON grammar (per RFC 4627) permits a few escapes, and apostrophe is not among them. To wit:
string = quotation-mark *char quotation-mark
char = unescaped /
escape (
%x22 / ; " quotation mark U+0022
%x5C / ; \ reverse solidus U+005C
%x2F / ; / solidus U+002F
%x62 / ; b backspace U+0008
%x66 / ; f form feed U+000C
%x6E / ; n line feed U+000A
%x72 / ; r carriage return U+000D
%x74 / ; t tab U+0009
%x75 4HEXDIG ) ; uXXXX U+XXXX
escape = %x5C ; \
quotation-mark = %x22 ; "
unescaped = %x20-21 / %x23-5B / %x5D-10FFFF
Apostrophe is U+0027.
As with the UTF-8 issues, it's a quick job to implement a filter to scan for escaped apostrophes and unescape them, but it would be nice if what is advertised as JSON was actually JSON.












Comment from Roberto @ 2008-01-27T03:21:56Z # permalink
Any programming language not having UTF-8 support these days can not be considered for production.
Sure, del.icio.us needs to fix its JSON serialization if its not right, but parsers should be able to have a permissive mode where some errors are accepted just like most important web browsers HTML parsers.
Comment from Paul Brown @ 2008-01-27T05:09:11Z # permalink
Haskell does support Unicode internally (which is the reason that Haskell Strings burn so much memory). Like most other languages (e.g., Java), it reads handles as text using whatever default encoding the operating system uses. Unlike most other languages (e.g., Java again), it doesn't provide direct support for encodings when serializing Strings or deserializing bytes.
Comment from timb @ 2008-01-27T12:00:47Z # permalink
dammit, i remember fixing that apostrophe-escaping bug YEARS ago
Comment from Andreas Krey @ 2008-01-27T12:10:02Z # permalink
Haskell primarily burns lots of memory because of the decision that strings are lists of chars; regular lists. Needing the next pointer anyway, there is no much point in using only 8 bits for the char. :-)
Btw. the recent trend to use bytestrings in parsec and elsewhere feels a bit like throwing out the kid (unicode support) with the bath (massive storage overhead).
Json being parseable as javascript was indeed a design goal; as far as I know there exists a regular expression which checks whether a json string actually only contains data literals and won't do anything bad when evaluated.
Comment from duncan @ 2008-01-27T12:54:24Z # permalink
Try using the utf8-string package from hackage to decode the strings.
Comment from Joshua @ 2008-01-27T19:17:26Z # permalink
I sent this on to the relevant folks.
Comment from dons @ 2008-01-27T20:12:45Z # permalink
The utf8-string package specifically allows you to use the existing IO operations, with encoded Strings. It is used in production systems.
http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string
Comment from Paul Brown @ 2008-01-27T23:58:56Z # permalink
@dons and @duncan - I missed the utf8 package when I was rummaging on Hackage. I'll give it a go.
Comment from Erigami @ 2008-08-13T13:55:59Z # permalink
According to the RFC:
http://www.ietf.org/rfc/rfc4627.txt?number=4627
[...]
According to ECMAScript 262 (referenced in RFC 4627)
[...]
\' is a valid character in JSON.
Comment from Cowtowncoder @ 2009-02-17T07:24:07Z # permalink
Erigami: you are wrong.
Whether javascript allows it is irrelevant (except for historical interest): JSON RFC that you link to clearly lists allowed escape combinations, and this particular one is not included; hence it is invalid. And thereby it is not to be used for well-formed json content.
This is not really contradicting subsetting part; but the comment about aiming to be a subset is commentary, not definition of Json. Json != Javascript.