STM, IO, and a Simple Persistence Model

Paul Brown @ 2007-03-04T17:57:00Z

Herein post 5 of n on my hobby project to rewrite my personal publishing software in Haskell. (In strict terms, the project is to write it, since I didn't write the current system.) This post covers persistence and concurrency using the filesystem and Haskell's software transactional memory implementation.

Exploiting Commutativity and Choosing Locking Granularity

As I imagine things working, the basic operations that I want to be able to perform against the persistent form of the blog are something like:

  • Create an entry (and by extension, a comment).
  • Change the metadata on an entry, e.g., publish/unpublish or add/remove tags.
  • Add a comment to an existing entry.

From an end-user perspective, these all commute with each other — it doesn't matter whether a comment is added before or after a tag is changed — so it's reasonable to let the system take care of ordering the operations to be performed. Moreover, because creation commutes with linking, locking granularity can be limited to an individual entry. (There is no reason to lock both the newly created comment and its parent entry simultaneously.)

Without further ado, here's a locking scheme implemented at the granularity of an entry. This would be used only for writes. First, a wrapper type to hold the lock status for an entry:

data Holder = Holder { entry :: Entry,
                       locked :: Bool }
            deriving (Show)

And then the lock/unlock code:

lock :: Holder -> Holder
lock (Holder e False) = Holder e True
lock (Holder e True) = error "Already locked."

unlock :: Holder -> Holder
unlock (Holder e True) = Holder e False
unlock (Holder e False) = error "Already unlocked."

It's worth stopping to observe a common construct in functional programming. A lock function that locks a Holder can't exist because all values are immutable. Instead, lock creates a new Holder that has locked set to True but is otherwise identical to the original, and we can use the STM mechanics to create actions to be applied to a TVar:

check_out :: TVar Holder -> STM ()
check_out h = do { h' < readTVar h 
                 ; if locked h'
                   then retry
                   else writeTVar h (lock h') }

check_in :: TVar Holder -> STM ()
check_in h = do { h' < readTVar h
                ; if locked h'
                  then writeTVar h (unlock h')
                  else error "Already unlocked." }

The retry above will cause check_out to block until the entry is checked back in, while check_in signals an error if it is asked to release an already free entry.

By the way, the following one-liner to print the showable value wrapped in a TVar is useful for experimenting with STM in ghci:
s_show :: Show a => TVar a -> IO String
s_show = atomically.(liftM show).readTVar

Operating on Entries

To integrate operations on entries, I'm going to take the minimal use case of publishing and unpublishing, so my Entry data structure is almost trivial:

data Entry = Entry { entry_id :: String,
                     published :: Bool }
             deriving (Show)

publish :: Entry -> Entry
publish (Entry i _) = Entry i True

unpublish :: Entry -> Entry
unpublish (Entry i _) = Entry i False

Add in a function to convert an Entry -> Entry function to a Holder -> Holder function:

liftH :: (Entry -> Entry) -> (TVar Holder -> STM ())
liftH f = \ h -> do { h' < readTVar h
                    ; writeTVar h ((holderize f) h')
                    ; return () }
          where holderize f = \ (Holder e l) -> Holder (f e) l

Combining a publish with check_in/check_out is straightforward in the STM monoid. Here's some scratch work in ghci that shows this in action:

$ ghci -package stm
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
Loading package stm-2.0 ... linking ... done.
:Prelude> :load experiment.hs
[1 of 1] Compiling Main             ( experiment.hs, interpreted )
Ok, modules loaded: Main.
*Main> let h = Holder (Entry "foo" False) False
*Main> th < atomically ( newTVar h )
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = False}, locked = False}"
*Main> let co = check_out th
*Main> let pub = (liftH publish) th
*Main> let ci = check_in th
*Main> atomically ( co >> pub >> ci)
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = True}, locked = False}"

Integrating Persistence via IO

One of my working design assumptions is that the data for the system will reside entirely in memory, being updated as changes are made and reloaded (lazily) in the event of a system crash or system start-up. (As I commented previously, four years of blogging has produced around 500kb of content, mark-up included, so this isn't an unreasonable assumption.) Comments from spammers could produce a lot more data, but I plan to save every item but only load published items into memory. (So spammers are just going to burn disk space.) I'm going to aim for one file per entry, for the sake of the current discussion, named by the entry_id of the Entry. Conveniently, STM includes the unsafeIOToSTM function for composing STM actions and IO actions. (The other way around is not permitted by design.)

Attention: I've gotten some public and private comments that unsafeIOToSTM is not the right thing to use in this scenario, so I've written a revision to this entry.

Writing an entry to a file is straightforward:

store :: TVar Holder -> STM ()
store h = do { h' < readTVar h
             ; let e = entry h'
             ; unsafeIOToSTM (writeFile (fname e) ((show e) ++ "\n")) }

Continuing the same ghci session from above:

*Main> let out = store th
*Main> :! cat entry-foo.hb
cat: entry-foo.hb: No such file or directory
*Main> atomically (  co >> pub >> out >> ci )
*Main> :! cat entry-foo.hb
Entry {entry_id = "foo", published = True}
*Main> let unpub = (liftH unpublish) th
*Main> atomically (  co >> unpub >> out >> ci )
*Main> :! cat entry-foo.hb
Entry {entry_id = "foo", published = False}

This could (and probably should) be made a bit fancier with regard to recovering from errors while writing the file, but I'm happy with the basic ergonomics so far.

Meta

Tags: (tag) (tag) (tag) (tag)

(comment bubbles) 1 comment
2692 direct views

Comment from @ 2007-03-13T15:25:28Z # permalink