Herein post 5 of n on my hobby project to rewrite my personal publishing software in Haskell. (In strict terms, the project is to write it, since I didn't write the current system.) This post covers persistence and concurrency using the filesystem and Haskell's software transactional memory implementation.
Exploiting Commutativity and Choosing Locking Granularity
As I imagine things working, the basic operations that I want to be able to perform against the persistent form of the blog are something like:
- Create an entry (and by extension, a comment).
- Change the metadata on an entry, e.g., publish/unpublish or add/remove tags.
- Add a comment to an existing entry.
From an end-user perspective, these all commute with each other — it doesn't matter whether a comment is added before or after a tag is changed — so it's reasonable to let the system take care of ordering the operations to be performed. Moreover, because creation commutes with linking, locking granularity can be limited to an individual entry. (There is no reason to lock both the newly created comment and its parent entry simultaneously.)
Without further ado, here's a locking scheme implemented at the granularity of an entry. This would be used only for writes. First, a wrapper type to hold the lock status for an entry:
data Holder = Holder { entry :: Entry,
locked :: Bool }
deriving (Show)
And then the lock/unlock code:
lock :: Holder -> Holder lock (Holder e False) = Holder e True lock (Holder e True) = error "Already locked." unlock :: Holder -> Holder unlock (Holder e True) = Holder e False unlock (Holder e False) = error "Already unlocked."
It's worth stopping to observe a common construct in functional programming. A lock function that locks a Holder can't exist because all values are immutable. Instead, lock creates a new Holder that has locked set to True but is otherwise identical to the original, and we can use the STM mechanics to create actions to be applied to a TVar:
check_out :: TVar Holder -> STM ()
check_out h = do { h' < readTVar h
; if locked h'
then retry
else writeTVar h (lock h') }
check_in :: TVar Holder -> STM ()
check_in h = do { h' < readTVar h
; if locked h'
then writeTVar h (unlock h')
else error "Already unlocked." }
The retry above will cause check_out to block until the entry is checked back in, while check_in signals an error if it is asked to release an already free entry.
showable value wrapped in a TVar is useful for experimenting with STM in ghci:
s_show :: Show a => TVar a -> IO String s_show = atomically.(liftM show).readTVar
Operating on Entries
To integrate operations on entries, I'm going to take the minimal use case of publishing and unpublishing, so my Entry data structure is almost trivial:
data Entry = Entry { entry_id :: String,
published :: Bool }
deriving (Show)
publish :: Entry -> Entry
publish (Entry i _) = Entry i True
unpublish :: Entry -> Entry
unpublish (Entry i _) = Entry i False
Add in a function to convert an Entry -> Entry
function to a Holder -> Holder function:
liftH :: (Entry -> Entry) -> (TVar Holder -> STM ())
liftH f = \ h -> do { h' < readTVar h
; writeTVar h ((holderize f) h')
; return () }
where holderize f = \ (Holder e l) -> Holder (f e) l
Combining a publish with
check_in/check_out is straightforward in the
STM monoid. Here's some scratch work in
ghci that shows this in action:
$ ghci -package stm
___ ___ _
/ _ \ /\ /\/ __(_)
/ /_\// /_/ / / | | GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __ / /___| | http://www.haskell.org/ghc/
\____/\/ /_/\____/|_| Type :? for help.
Loading package base ... linking ... done.
Loading package stm-2.0 ... linking ... done.
:Prelude> :load experiment.hs
[1 of 1] Compiling Main ( experiment.hs, interpreted )
Ok, modules loaded: Main.
*Main> let h = Holder (Entry "foo" False) False
*Main> th < atomically ( newTVar h )
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = False}, locked = False}"
*Main> let co = check_out th
*Main> let pub = (liftH publish) th
*Main> let ci = check_in th
*Main> atomically ( co >> pub >> ci)
*Main> s_show th
"Holder {entry = Entry {entry_id = \"foo\", published = True}, locked = False}"
Integrating Persistence via IO
One of my working design assumptions is that the data for the
system will reside entirely in memory, being updated as changes are
made and reloaded (lazily) in the event of a system crash or system
start-up. (As I commented previously, four years of blogging has
produced around 500kb of content, mark-up included, so this isn't an
unreasonable assumption.) Comments from spammers could produce a lot
more data, but I plan to save every item but only load published items
into memory. (So spammers are just going to burn disk space.) I'm
going to aim for one file per entry, for the sake of the current
discussion, named by the entry_id of the
Entry. Conveniently, STM includes the
unsafeIOToSTM function for composing STM
actions and IO actions. (The other way around is not
permitted by design.)
Attention: I've gotten some public and private comments that
unsafeIOToSTM is not the right thing to use in this
scenario, so I've written a revision
to this entry.
Writing an entry to a file is straightforward:
store :: TVar Holder -> STM ()
store h = do { h' < readTVar h
; let e = entry h'
; unsafeIOToSTM (writeFile (fname e) ((show e) ++ "\n")) }
Continuing the same ghci session from above:
*Main> let out = store th
*Main> :! cat entry-foo.hb
cat: entry-foo.hb: No such file or directory
*Main> atomically ( co >> pub >> out >> ci )
*Main> :! cat entry-foo.hb
Entry {entry_id = "foo", published = True}
*Main> let unpub = (liftH unpublish) th
*Main> atomically ( co >> unpub >> out >> ci )
*Main> :! cat entry-foo.hb
Entry {entry_id = "foo", published = False}
This could (and probably should) be made a bit fancier with regard to recovering from errors while writing the file, but I'm happy with the basic ergonomics so far.












Comment from @ 2007-03-13T15:25:28Z # permalink