This post describes a method for importing a Radio weblog to SnipSnap using XSLT and some elbow grease. I arrived at this approach by creating a couple of straw-man entries in SnipSnap, exporting them to XML using the management facilities, and then examining the format.
Thankfully, SnipSnap provides the ability to import content from an XML file in a specific format. Radio stores local backups of each post in a somewhat reasonable XML format, so importing involves massaging the Radio XML backup files into a SnipSnap-friendly format.
Radio stores the backups in the backups/weblogArchive/posts subdirectory of the Radio installation directory, and the content of each post is escaped and stored in the attribute /table/string[@name='content']/@value and the first order of business is to run an XSLT over the files to unescape this content and write it back out. My choice for doing this was:
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*" />
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<xsl:template match="string[@name='text']" priority="2">
<content><xsl:value-of select="@value"
disable-output-escaping="yes" /></content>
</xsl:template>
Using disable-output-escaping to generate markup is (justifiably) frowned on but necessary in this case, and it's up to you to ensure that any non-well-formed XML is cleaned up. (E.g., undeclared entities such as and — or unmatched elements like <br>.)
The next step is to take the files with the cleaned-up, unescaped markup and convert them into snips formatted for SnipSnap to import. This involves some work to transform HTML markup into wiki markup, e.g.:
<xsl:template match="b|B|strong|STRONG|em|EM">__<xsl:apply-templates />__</xsl:template>
and some of this will depend on the way that the entries were marked up to begin with.
SnipSnap labels the snip corresponding to a blog entry with the yyyy-MM-dd date and parent snip space, so the entry fragment template looks like:
<xsl:template match="/">
<snip>
<name><xsl:call-template name="extract-date">
<xsl:with-param name="doc" select="/" />
</xsl:call-template></name>
<content><xsl:apply-templates select="/table/string[@name='title']" />
<xsl:apply-templates
select="/table/content" /></content>
<cTime></cTime>
<mTime></mTime>
<cUser>prb</cUser>
<mUser>prb</mUser>
<parentSnip>start</parentSnip>
<backLinks></backLinks>
<snipLinks></snipLinks>
<labels></labels>
<attachments></attachments>
<viewCount>0</viewCount>
<permissions>Edit:Owner</permissions>
</snip>
</xsl:template>
<xsl:template match="string[@name='title']">1. <xsl:value-of
select="@value" /> {anchor:<xsl:value-of
select="@value" />}</xsl:template>
Note that the "prb" should be replaced with your
username.
The date extraction is a little bit painful:
<xsl:template name="extract-date">
<xsl:variable name="datestr"
select="substring-after(string(/table/date[@name='when']/@value),', ')" />
<xsl:variable name="day" select="substring-before($datestr,' ')" />
<xsl:variable name="datestr2" select="substring-after($datestr,' ')" />
<xsl:variable name="month">
<xsl:choose>
<xsl:when test="starts-with($datestr2,'Jan')">01</xsl:when>
<xsl:when test="starts-with($datestr2,'Feb')">02</xsl:when>
<xsl:when test="starts-with($datestr2,'Mar')">03</xsl:when>
<xsl:when test="starts-with($datestr2,'Apr')">04</xsl:when>
<xsl:when test="starts-with($datestr2,'May')">05</xsl:when>
<xsl:when test="starts-with($datestr2,'Jun')">06</xsl:when>
<xsl:when test="starts-with($datestr2,'Jul')">07</xsl:when>
<xsl:when test="starts-with($datestr2,'Aug')">08</xsl:when>
<xsl:when test="starts-with($datestr2,'Sep')">09</xsl:when>
<xsl:when test="starts-with($datestr2,'Oct')">10</xsl:when>
<xsl:when test="starts-with($datestr2,'Nov')">11</xsl:when>
<xsl:when test="starts-with($datestr2,'Dec')">12</xsl:when>
</xsl:choose>
</xsl:variable>
<xsl:variable name="year" select="substring-before(substring-after($datestr2,' '),' ')" />
<xsl:value-of select="$year" />-<xsl:value-of select="$month" />-<xsl:value-of select="$day" />
</xsl:template>
as is the replacement of any {, },
[, or ] characters in the posts:
<xsl:template name="escape-chars">
<xsl:param name="txt" />
<xsl:choose>
<xsl:when test="contains($txt,'[')"><xsl:value-of
select="substring-before($txt,'[')" />\[<xsl:call-template
name="escape-chars">
<xsl:with-param name="txt"
select="substring-after($txt,'[')" />
</xsl:call-template></xsl:when>
... And then similarly for the others ...
<xsl:otherwise><xsl:value-of select="$txt" /></xsl:otherwise>
</xsl:choose>
</xsl:template>
Once this is done, then all of the snips need to be grouped
together:
<snipspace>
... snips go here ...
</snipspace>
and the snips with the same date (and thus the same name) need to
have their content merged together.
That should do it! Just import the file through the management
facility in SnipSnap, fix any errors that are there (which you'll have
to do without any sort of debugging information), and it's done.
Addendum
I had to do some additional work with the data once I'd moved it into SnipSnap.
First, I had to do some additional search/replace to ensure that
all permalinks (anchors) were valid names, which entailed removing
commas, apostrophes, and other non-name characters. (The short
summary would be that letters, numbers, periods, colons, and
underscores are legal.)
Second, I modified the regular expression in the
org.snipsnap.semanticweb.rss.Rssify (line 58 in 0.4.2a)
that chunks the posts for RSS purposes so that it splits on top-level
headers only instead of on every header:
pattern = compiler.compile(
"^[[:space:]]*(1)[[:space:]]+(.*?)$",
Perl5Compiler.MULTILINE_MASK);