JSON as a Migration Format

Paul Brown @ 2007-02-22T16:11:12Z

I'm making slow progress on my personal publishing platform rewrite in Haskell (see earlier posts here and here), so herein part 3 of n, wherein I experiment with data migration and an embarrassingly simple data model. A forthcoming part 4 will be really simple Atom serialization.

Data Out, Data In

As I experiment with the new platform, I'd like a way to move the data from the typo instance into the new environment on-demand; this post is my lab notebook for the export/import experiment.

One of the things that Rails has gotten 100% right is the ability to (easily) access configured environments via interactive (script/console) or scripting (script/running) front-ends. (Using a framework like Spring in the Java space can provide similar functionality by constructing an application context, but it's more awkward to separate out the services that the runtime container would be providing.) My first thought on exporting was to use YAML, but the significant whitespace and cryptic annotations ("|" for a free-form text block with a trailing linebreak and "|-" for a free-form text block without a trailing linebreak) just rubbed me the wrong way. JSON turns out to be a better choice because ActiveRecord supports JSON export (via ActiveSupport::JSON), and there are a couple of JSON libraries for Haskell. One is a predecessor version of the other, and I'm going to work with the earlier version because it has no dependencies other than a baseline GHC 6.6 install.

Getting an entry out to play with is a piece of cake:

./script/runner 'puts Article.find_by_state("Published").to_json' \
  > /tmp/entry.json

Parsing the JSON is similarly straightforward:

$ ghci
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
Prelude> :load /tmp/JSON.hs
[1 of 1] Compiling JSON             ( /tmp/JSON.hs, interpreted )
Ok, modules loaded: JSON.
*JSON> entry <- P.parseFromFile json "/tmp/entry.json"
Loading package parsec-2.0 ... linking ... done.
Right (Object (fromList [("attributes",Object (fromList [
[...]

(JSON.hs aliases Data.Map as M and Text.ParserCombinators.Parsec as P, so that's where the P.parseFromFile is coming from.) The map of values is wrapped up a bit, but a couple of simple functions will get it out from behind the type constructors:

*JSON> let unR = \(Right r) -> r
*JSON> let unO = \(Object o) -> o
*JSON> :t (unO.unR) entry
(unO.unR) entry :: M.Map String Value

which gets us down to the level of the first map with one entry under the key "attributes". To get the map of attributes we want out:

*JSON> let m = unO ((((M.!).unO.unR) entry) "attributes")
*JSON> m
fromList [("allow_comments",String "1"),("allow_pings",String "1"),...

(Haskell uses ! for dereferencing keys in a Data.Map.) And now the components of the entry are easy to extract:

*JSON> let atts = ((M.!) m)
*JSON> atts "allow_comments"
String "1"
*JSON> atts "updated_at"
String "2006-09-15 02:12:45"
*JSON> atts "body" 
String "<p>Although I really do like the...

The first-cut data model for entries looks like this:

data BlogPost = BlogPost { p_title :: String,
               p_summary :: Maybe String,
               p_permalink :: String,
               p_metadata :: PostMetadata,
               p_body :: String,
               p_tags :: [String],
               p_uid :: String,
               p_comments :: [BlogPost]
             }
        deriving (Show)

data PostMetadata = PostMetadata { m_created :: CalendarTime,
                   m_publish :: CalendarTime,
                   m_updated :: CalendarTime,
                   m_author :: PostAuthor,
                   m_published :: Bool }
          deriving (Show)

data PostAuthor = PostAuthor { p_name :: String,
                   p_uri :: Maybe String,
                   p_email :: Maybe String,
                   p_showEmail :: Bool
                 }
          deriving (Show)

And interpolating from typo's model to the new model is just putting the fields in the right place with a little bit of date munging, since the new model has the expectation that dates are represented as Haskell CalendarTime. The reuse of the BlogPost structure for comments is intentional, both for Atom syndication and to support threaded comments.

Pulling all of the entries out is also straightforward:

$ ./script/runner 'puts (Article.find_all_by_state("Published")).to_json' \
  > /tmp/entry.json
$ ./script/runner 'puts (Article.find_all_by_state("ContentState::Published")).to_json' \
  >> /tmp/entry.json

and pulling comments and trackbacks is a similar exercise:

$ ./script/runner 'puts (Comment.find_all_by_state("ContentState::Ham")).to_json' \
  > /tmp/comments.json
$ ./script/runner 'puts (Trackback.find_all_by_state("ContentState::Ham")).to_json' \
  > /tmp/trackbacks.json

although it takes a little doing to collate comments and trackbacks with their parent posts. So far, so good — unlike any of the other migrations (Radio Userland→SnipSnap, SnipSnap→WordPress, and WordPress→typo) I've done, this looks to be neither lossy nor labor-intensive.

As an aside, over about four years of blogging (2003-02-17 through the present), I've accumulated the equivalent of ~110 single-spaced pages of content.

(comment bubbles) 1 comment

Cure for irbarrhea

Paul Brown @ 2007-02-22T00:19:14Z

I've been working with the ActiveRecord model for my typo installation using irb (via scripts/console), but some expressions were producing a little too much output for convenience, e.g.:

Article.find_all.collect { |a| a.permalink }

Fortunately, there is a convenient configuration property that does the trick by suppressing the output of evaluated expressions:

conf.return_format = ""

After which, what you get is what you puts.

(comment bubbles) 0 comments

Toddler Version of Getting Back on the Horse

Paul Brown @ 2007-02-17T10:07:18Z

The kid took a little spill at the playground today. She was happily rocking away on one of these rocking contraptions that is a fixture on Seattle playgrounds, where the bottom portion is a heavy coil spring and the top is made out of plywood and metal to look like a motorcycle or a horse or whatever. Then she stopped to look at her shoe. I wandered around from behind her to look at her shoe, too, and as soon I was out from behind her, she pitched off backwards and landed flat on her back on packed sand. A 30-inch drop is substantial when you're three feet tall... This was my second fatherly failing of the day, although the earlier incident (wherein I put her shoes on the wrong feet) was more minor but probably still enough to knock me out of contention.

She's made of sturdy stuff and was just fine, but she was pretty upset nonetheless. Once the wife had her calmed down, she trudged straight back, demanded to be put back on, and went back to happily rocking. She's always been an intent and intense little person, although the more common manifestation has been "I am going to jump in that puddle, and there is nothing you can do to stop me. Even if you pick me up kicking and screaming and carry me a few blocks away, I will find my way back..." We've had hopes all along that her intensity would turn out to be a positive thing, and this was the first time that it looked like it might indeed.

02-17-07_1658

(comment bubbles) 0 comments

Ownershit

Paul Brown @ 2007-02-17T03:33:46Z

In the continuing spirit of the Devil's Dictionary, here's a definition of "ownershit":

ownershit, n. What you take when you inherit a project that someone else has tarted up for appearances but is otherwise a complete train wreck.
(comment bubbles) 0 comments

T's and K's

Paul Brown @ 2007-02-13T01:42:48Z

The kid's vocabulary is growing quickly, she says "please" once in a while without being prompted, and she understands ownership ("dada's chair", "mama's shoes", etc.) Just the same, she confuses "t" sounds and "k" sounds some of the time. Her version of "OK" makes me think of classic Eddie Murphy on SNL, back when it was funny. Sometimes, it makes it difficult for me to understand what she wants, but with practice, I think I've more or less got it down. The other day, she came to me and said, "Dada... titty... see!" Being an attentive father, I hustled in to see her favorite stuffed kitty tucked into her bed...

(comment bubbles) 0 comments

Laziness and fizzbuzz in Haskell

Paul Brown @ 2007-01-25T03:26:00Z

After peeking at Reg's solution, I can't resist posting a fizzbuzz implementation in Haskell (because Reg's looks stylistically like it should be written in an FP language of some flavor). It's also an example of the surprising effectiveness of being lazy. Some Haskell:

module Main(main) where

import List

f :: Int -> String
f x | (x `mod` 15 == 0) = " fizzbuzz"
    | (x `mod` 5 == 0) = " buzz"
    | (x `mod` 3 == 0) = " fizz"
f 1 = "1"
f x = ' ':(show x)

main :: IO ()
main = (putStr.concat) (map f [1..100])

Here's a slightly different version of the main function:

main' :: IO ()
main' = mapM_ putStr (map f [1..100])

So, which one is better? Interestingly, if you crank up the 100 to 10000000 (108), either one of those two versions runs in well under a minute, in <2MB of (resident) memory, and presents roughly the same heap profile:


main


main'

The main' version might naively appear to be faster than the main version, but this is laziness in action: a String is [Char], i.e., a list of characters. The list of characters passed to putStr in the main version is generated lazily, as are the IO actions in the main' version. (In fact, the main version is faster at under 7 seconds to roughly 20 seconds for the main' version.) As you would expect based on laziness, both programs begin producing output immediately upon execution. Meanwhile, without laziness, the ruby version chews up gobs of memory and takes a long time. (I got tired of waiting for it after about 30 minutes, at which point it was using 35M of resident memory without producing any output.) The simplest possible ruby solution (for loop, if...elsif...else) runs in around 45 seconds and consumes <2M of resident memory.

As an aside, reproducing the precise flavor of Reg's solution would mean composing a list of functions, which is simply expressed in Haskell terms and wouldn't impact laziness. If l is of type [Int -> Int] (a list of functions that map integers to integers), then:

foldr (.) id l

is the Int -> Int that applies the composition of the elements of l on the left, i.e., last l is the first function applied. Nonetheless, precisely the same solution (repeatedly applying a function parameterized by a modulus and a substitution) isn't as simply expressed because String and Int are distinct types.

Update: There is a thread going on reddit that has some terse Haskell versions, along with versions for a bunch of other languages.

(comment bubbles) 1 comment

Useful grep Flag

Paul Brown @ 2007-01-17T02:50:04Z

GNU grep supports a -A flag (or --after-context if you're a long options kind of person) that includes lines of context:

-A NUM, --after-context=NUM
       Print  NUM  lines  of  trailing  context  after  matching lines.
       Places  a  line  containing  --  between  contiguous  groups  of
       matches.

which is very handy for sifting through log files where you want grep to match the "ERROR" but the useful information is in the following couple of lines.

(comment bubbles) 0 comments

All Posts contains 397 items in 57 pages of 7 items each:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57