I'm making slow progress on my personal publishing platform rewrite in Haskell (see earlier posts here and here), so herein part 3 of n, wherein I experiment with data migration and an embarrassingly simple data model. A forthcoming part 4 will be really simple Atom serialization.
Data Out, Data In
As I experiment with the new platform, I'd like a way to move the data from the typo instance into the new environment on-demand; this post is my lab notebook for the export/import experiment.
One of the things that Rails has gotten 100% right is the ability to (easily) access configured environments via interactive (script/console) or scripting (script/running) front-ends. (Using a framework like Spring in the Java space can provide similar functionality by constructing an application context, but it's more awkward to separate out the services that the runtime container would be providing.) My first thought on exporting was to use YAML, but the significant whitespace and cryptic annotations ("|" for a free-form text block with a trailing linebreak and "|-" for a free-form text block without a trailing linebreak) just rubbed me the wrong way. JSON turns out to be a better choice because ActiveRecord supports JSON export (via ActiveSupport::JSON), and there are a couple of JSON libraries for Haskell. One is a predecessor version of the other, and I'm going to work with the earlier version because it has no dependencies other than a baseline GHC 6.6 install.
Getting an entry out to play with is a piece of cake:
./script/runner 'puts Article.find_by_state("Published").to_json' \
> /tmp/entry.json
Parsing the JSON is similarly straightforward:
$ ghci
___ ___ _
/ _ \ /\ /\/ __(_)
/ /_\// /_/ / / | | GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __ / /___| | http://www.haskell.org/ghc/
\____/\/ /_/\____/|_| Type :? for help.
Loading package base ... linking ... done.
Prelude> :load /tmp/JSON.hs
[1 of 1] Compiling JSON ( /tmp/JSON.hs, interpreted )
Ok, modules loaded: JSON.
*JSON> entry <- P.parseFromFile json "/tmp/entry.json"
Loading package parsec-2.0 ... linking ... done.
Right (Object (fromList [("attributes",Object (fromList [
[...]
(JSON.hs aliases Data.Map as M and Text.ParserCombinators.Parsec as P, so that's where the P.parseFromFile is coming from.) The map of values is wrapped up a bit, but a couple of simple functions will get it out from behind the type constructors:
*JSON> let unR = \(Right r) -> r *JSON> let unO = \(Object o) -> o *JSON> :t (unO.unR) entry (unO.unR) entry :: M.Map String Value
which gets us down to the level of the first map with one entry under the key "attributes". To get the map of attributes we want out:
*JSON> let m = unO ((((M.!).unO.unR) entry) "attributes")
*JSON> m
fromList [("allow_comments",String "1"),("allow_pings",String "1"),...
(Haskell uses ! for dereferencing keys in a Data.Map.) And now the components of the entry are easy to extract:
*JSON> let atts = ((M.!) m) *JSON> atts "allow_comments" String "1" *JSON> atts "updated_at" String "2006-09-15 02:12:45" *JSON> atts "body" String "<p>Although I really do like the...
The first-cut data model for entries looks like this:
data BlogPost = BlogPost { p_title :: String,
p_summary :: Maybe String,
p_permalink :: String,
p_metadata :: PostMetadata,
p_body :: String,
p_tags :: [String],
p_uid :: String,
p_comments :: [BlogPost]
}
deriving (Show)
data PostMetadata = PostMetadata { m_created :: CalendarTime,
m_publish :: CalendarTime,
m_updated :: CalendarTime,
m_author :: PostAuthor,
m_published :: Bool }
deriving (Show)
data PostAuthor = PostAuthor { p_name :: String,
p_uri :: Maybe String,
p_email :: Maybe String,
p_showEmail :: Bool
}
deriving (Show)
And interpolating from typo's model to the new model is just putting the fields in the right place with a little bit of date munging, since the new model has the expectation that dates are represented as Haskell CalendarTime. The reuse of the BlogPost structure for comments is intentional, both for Atom syndication and to support threaded comments.
Pulling all of the entries out is also straightforward:
$ ./script/runner 'puts (Article.find_all_by_state("Published")).to_json' \
> /tmp/entry.json
$ ./script/runner 'puts (Article.find_all_by_state("ContentState::Published")).to_json' \
>> /tmp/entry.json
and pulling comments and trackbacks is a similar exercise:
$ ./script/runner 'puts (Comment.find_all_by_state("ContentState::Ham")).to_json' \
> /tmp/comments.json
$ ./script/runner 'puts (Trackback.find_all_by_state("ContentState::Ham")).to_json' \
> /tmp/trackbacks.json
although it takes a little doing to collate comments and trackbacks with their parent posts. So far, so good — unlike any of the other migrations (Radio Userland→SnipSnap, SnipSnap→WordPress, and WordPress→typo) I've done, this looks to be neither lossy nor labor-intensive.
As an aside, over about four years of blogging (2003-02-17 through the present), I've accumulated the equivalent of ~110 single-spaced pages of content.












Comment from ejboy @ 2007-02-24T05:11:35Z # permalink