Migrating Radio to SnipSnap

Paul Brown @ 2003-10-04T01:00:00Z

This post describes a method for importing a Radio weblog to SnipSnap using XSLT and some elbow grease. I arrived at this approach by creating a couple of straw-man entries in SnipSnap, exporting them to XML using the management facilities, and then examining the format.

Thankfully, SnipSnap provides the ability to import content from an XML file in a specific format. Radio stores local backups of each post in a somewhat reasonable XML format, so importing involves massaging the Radio XML backup files into a SnipSnap-friendly format.

Radio stores the backups in the backups/weblogArchive/posts subdirectory of the Radio installation directory, and the content of each post is escaped and stored in the attribute /table/string[@name='content']/@value and the first order of business is to run an XSLT over the files to unescape this content and write it back out. My choice for doing this was:

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>
  
<xsl:template match="string[@name='text']" priority="2">
  <content><xsl:value-of select="@value"
    disable-output-escaping="yes" /></content> 
</xsl:template>

Using disable-output-escaping to generate markup is (justifiably) frowned on but necessary in this case, and it's up to you to ensure that any non-well-formed XML is cleaned up. (E.g., undeclared entities such as &nbsp; and &mdash; or unmatched elements like <br>.)

The next step is to take the files with the cleaned-up, unescaped markup and convert them into snips formatted for SnipSnap to import. This involves some work to transform HTML markup into wiki markup, e.g.:

<xsl:template match="b|B|strong|STRONG|em|EM">__<xsl:apply-templates />__</xsl:template>

and some of this will depend on the way that the entries were marked up to begin with.

SnipSnap labels the snip corresponding to a blog entry with the yyyy-MM-dd date and parent snip space, so the entry fragment template looks like:

<xsl:template match="/">
  <snip>
    <name><xsl:call-template name="extract-date">
      <xsl:with-param name="doc" select="/" />
    </xsl:call-template></name>
    <content><xsl:apply-templates select="/table/string[@name='title']" />
    <xsl:apply-templates
        select="/table/content" /></content>
    <cTime></cTime>
    <mTime></mTime>
    <cUser>prb</cUser>
    <mUser>prb</mUser>
    <parentSnip>start</parentSnip>
    <backLinks></backLinks>
    <snipLinks></snipLinks>
    <labels></labels>
    <attachments></attachments>
    <viewCount>0</viewCount>
    <permissions>Edit:Owner</permissions>
  </snip>
</xsl:template>

<xsl:template match="string[@name='title']">1. <xsl:value-of
    select="@value" /> {anchor:<xsl:value-of 
    select="@value" />}</xsl:template>

Note that the "prb" should be replaced with your username.

The date extraction is a little bit painful:

<xsl:template name="extract-date">
  <xsl:variable name="datestr" 
      select="substring-after(string(/table/date[@name='when']/@value),', ')" />
  <xsl:variable name="day" select="substring-before($datestr,' ')" />
  <xsl:variable name="datestr2" select="substring-after($datestr,' ')" />
  <xsl:variable name="month">
    <xsl:choose>
      <xsl:when test="starts-with($datestr2,'Jan')">01</xsl:when>
      <xsl:when test="starts-with($datestr2,'Feb')">02</xsl:when>
      <xsl:when test="starts-with($datestr2,'Mar')">03</xsl:when>
      <xsl:when test="starts-with($datestr2,'Apr')">04</xsl:when>
      <xsl:when test="starts-with($datestr2,'May')">05</xsl:when>
      <xsl:when test="starts-with($datestr2,'Jun')">06</xsl:when>
      <xsl:when test="starts-with($datestr2,'Jul')">07</xsl:when>
      <xsl:when test="starts-with($datestr2,'Aug')">08</xsl:when>
      <xsl:when test="starts-with($datestr2,'Sep')">09</xsl:when>
      <xsl:when test="starts-with($datestr2,'Oct')">10</xsl:when>
      <xsl:when test="starts-with($datestr2,'Nov')">11</xsl:when>
      <xsl:when test="starts-with($datestr2,'Dec')">12</xsl:when>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="year" select="substring-before(substring-after($datestr2,' '),' ')" />
  <xsl:value-of select="$year" />-<xsl:value-of select="$month" />-<xsl:value-of select="$day" />
</xsl:template>

as is the replacement of any {, }, [, or ] characters in the posts:

<xsl:template name="escape-chars">
  <xsl:param name="txt" />
  <xsl:choose>
    <xsl:when test="contains($txt,'[')"><xsl:value-of
        select="substring-before($txt,'[')" />\[<xsl:call-template
        name="escape-chars">
          <xsl:with-param name="txt"
              select="substring-after($txt,'[')" />
        </xsl:call-template></xsl:when>
    ... And then similarly for the others ...
    <xsl:otherwise><xsl:value-of select="$txt" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

Once this is done, then all of the snips need to be grouped together:

<snipspace>
  ... snips go here ...
</snipspace>

and the snips with the same date (and thus the same name) need to have their content merged together.

That should do it! Just import the file through the management facility in SnipSnap, fix any errors that are there (which you'll have to do without any sort of debugging information), and it's done.

Addendum

I had to do some additional work with the data once I'd moved it into SnipSnap.

First, I had to do some additional search/replace to ensure that all permalinks (anchors) were valid names, which entailed removing commas, apostrophes, and other non-name characters. (The short summary would be that letters, numbers, periods, colons, and underscores are legal.)

Second, I modified the regular expression in the org.snipsnap.semanticweb.rss.Rssify (line 58 in 0.4.2a) that chunks the posts for RSS purposes so that it splits on top-level headers only instead of on every header:

pattern = compiler.compile(
  "^[[:space:]]*(1)[[:space:]]+(.*?)$",
  Perl5Compiler.MULTILINE_MASK);

Meta

No tags.

(comment bubbles) 0 comments
658 direct views