Kata 19 or BFS/DFS by Any Other Name Still Smells as Sweet

Paul Brown @ 2003-11-04T08:00:00Z

I ran across Dave Thomas's kata 19 yesterday and decided to give it a quick go.

Based on Dave's supplied dictionary of 45,425 words, the graph of transitions, e.g., "cat" and "cot" are connected by an undirected edge, only contains 15,402 edges. (This shouldn't be all that surprising, since there are plenty of isolated words, e.g., "prefix" and "suffix".) With a couple of minor improvements to the obvious algorithm and written using collections for convenience but not necessarily speed, a Java program can sort this out in around 45 seconds on my laptop.

The rest of the problem boils down to breadth-first search (BFS) and depth-first search (DFS) in the graph plus a little common sense or cleverness. A wandering algorithm to find a (any) path connects "ruby" and "code" in 2-3 seconds with a relatively long chain, e.g.:

ruby, rubs, dubs, tubs, tabs, cabs, caps, raps, laps, lips, lies, vies, vier, bier, pier, tier, tied, lied, lieu, lien, lion, loon, Moon, noon, nook, hook, hood, wood, word, wore, lore, lyre, lure, lurk, lark, dark, bark, bars, bats, vats, vans, vane, vine, sine, site, bite, bile, pile, pale, pole, mole, move, cove, code
length 54, 3182ms

A slightly modified BFS search, however, constructs the optimal:

ruby, rubs, robs, robe, rode, code
length 6, 1249ms

in just over a second, and given the relatively sparse nature of the graph, this shouldn't be all that surprising. (It sounds like Dave's solution isn't symmetric in the endpoints, but mine is, so turning "code" to "ruby" or "ruby" to "code" is equivalent.)

The next question is whether the whole graph is connected if insertions and deletions are allowed, e.g., "cat" is adjacent to "coat".


Update: Some comments in Dave's blog indicate that he's touched up his algorithm.

(comment bubbles) 0 comments

IT, Productivity, and the Real New Economy

Paul Brown @ 2003-11-02T02:00:00Z

The Harvard Business Review published an article from Diana Farrell called The Real New Economy that both stratifies and ties the productivity gains of the 1990's back to certain types of IT investments.

The thesis of the article, which is welcome from my perspective, is YMMV (your mileage may vary):

The success of IT investment hinges on the particular practices of different industries and the particular practices of different companies. [...] For IT to fulfill its promise, users and vendors must deploy it thoughtfully, tailoring it to individual sectors and businesses and merging it with other product and process innovations.

The most interesting quantitative observation is that retailing, securities brokerage, wholesaling, semiconductors, computer assembly, and telecommunications account for only 32% of the GDP but accounted for 76% of the net productivity gain. No surprise to those of us who poo-poo'd the bubble from start to end, securities brokerage was the only one of these six where the Internet played any role in the productivity gain. In other sectors, relative simplicity and accessibility of innovation (e.g., online banking) created a diffusion effect where any increased productivity from IT deployment was quickly matched by rivals deploying equivalent technology.

The (obvious) lesson is to target IT at improving the efficiency and agility of core business processes and to do so in the most exclusive and specific way. Anything else would be a waste of time and money. Here are a few suggested decision points:

  • Does the technical innovation either accentuate one of my competitive advantages or cancel one of my competitive disadvantages?
  • What would prevent my competitors from trumping my investment with a similar deployment?
  • Does the technical innovation measurably increase capital output per capital input?

This sort of thinking doesn't make it easy to buy technology, since there is no universally applicable return on investment computation and no way for customers and vendors to quickly identify each other, but… that's the point!

(comment bubbles) 0 comments

Getting Started on Oblique Strategies for Business

Paul Brown @ 2003-11-02T01:00:00Z

I've long been a fan of Brian Eno both as a musician (especially Ambient 4) and as a thinker, and I've recently found myself wishing for an Oblique Strategies deck for business.

The idea behind Oblique Strategies was to put 100 generally useful (for artistic purposes) suggestions on cards like "Use a different color." or "Play it backwards." and then use the cards either en masse to get started on a project or individually as a means to get out of a creative funk. Clearly, it's a good idea to spend as little time as possible being stuck or stumped.

I'm getting started on a set of cards for personal use, based more or less on the strategies I use when I'm stumped or stuck. The first set are related to time management:

  • Rank by the cost of failure instead of the value of success.
  • Rank by the value of success instead of the cost of failure.
  • Focus on the simplest tasks.
  • Focus on the most complex tasks.
  • Do the last part first.
  • Split items into smaller tasks.
  • Group items into larger tasks.
  • Delegate.
  • Eliminate the least important 50% of the tasks.

and positioning:

  • Rent it.
  • Give it away.
  • Turn your competitors into customers.
  • Break it into pieces.
  • Be different.
  • Be the same.
Now I just need 85 more and some card stock.
(comment bubbles) 0 comments

Workflow Everywhere

Paul Brown @ 2003-10-31T00:00:00Z

I just caught the latest workflow-related open source project announcement over on TheServerSide.com. There is YAJWP! (Yet Another Java Workflow Project, and that isn't necessarily bad.)

Bonita is just the tip of the iceberg in the YAJWP world. Just in the ObjectWeb universe, there is also an XPDL-based designer called JaWE and an execution engine called Shark. There is also jBPM (which doesn't use XPDL), WfMOpen (which does use XPDL), and the list goes on. (I think that Carlos has the longest list that I've seen.)

Of course, this all begs the question of what workflow actually is. The WfMC has a reasonable right to provide a definition, and they define it as the natural extension of paper-passing into the automated world, specifically where an automated system assigns, passes, and tracks the work. Most technologists take a looser definition, however, and while I don't claim to be able to answer that question conclusively, I do claim that this is a manifestation of people looking for betters ways to model and implement coarse applications. The use of the term "workflow" connotes the involvement of a human element through aspects of web applications, document management, or other applications. (Dragos Manolescu has one of my favorite perspectives on process-oriented programming in general.)

The unfortunate thing for both vendors (be they commercial entities or open source projects or both) and customers is that there is no clearer, crisper definition of workflow than as a coarsely defined application with a human element. (And what application doesn't include a human element? The presence of a user interface is actually part of the dictionary definition of a software application.)

(comment bubbles) 0 comments

Apple on a Roll: More for More

Paul Brown @ 2003-10-16T00:00:00Z

According to MacCentral, it looks like Apple is on a roll:

With refreshed PowerBook G4s across the line, including the long-awaited update to Apple's popular 15-inch PowerBook G4 model, the company saw 9 percent sequential growth and 203 percent year over year growth in PowerBook sales, totally 176,000 units for the quarter and $348 million in revenue.

This looks especially good in the face of a 15% number year-to-year number for generic PCs from IDC and Gartner. (The current situation with Windows doesn't make me doubt the switch to a PowerBook; see Made the Move to Mac.)

It's interesting to consider the reasons behind Apple's success. From my perspective, that success is due to a more for more positioning on the cartesian grid of less, same, and more product for price, and this is arguably the best way to build a niche business in an otherwise crowded space. The challenge, which Apple has met artfully, is how to convincingly deliver that more to the customer.

(comment bubbles) 0 comments

Migrating Radio to SnipSnap

Paul Brown @ 2003-10-04T01:00:00Z

This post describes a method for importing a Radio weblog to SnipSnap using XSLT and some elbow grease. I arrived at this approach by creating a couple of straw-man entries in SnipSnap, exporting them to XML using the management facilities, and then examining the format.

Thankfully, SnipSnap provides the ability to import content from an XML file in a specific format. Radio stores local backups of each post in a somewhat reasonable XML format, so importing involves massaging the Radio XML backup files into a SnipSnap-friendly format.

Radio stores the backups in the backups/weblogArchive/posts subdirectory of the Radio installation directory, and the content of each post is escaped and stored in the attribute /table/string[@name='content']/@value and the first order of business is to run an XSLT over the files to unescape this content and write it back out. My choice for doing this was:

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:apply-templates />
  </xsl:copy>
</xsl:template>
  
<xsl:template match="string[@name='text']" priority="2">
  <content><xsl:value-of select="@value"
    disable-output-escaping="yes" /></content> 
</xsl:template>

Using disable-output-escaping to generate markup is (justifiably) frowned on but necessary in this case, and it's up to you to ensure that any non-well-formed XML is cleaned up. (E.g., undeclared entities such as &nbsp; and &mdash; or unmatched elements like <br>.)

The next step is to take the files with the cleaned-up, unescaped markup and convert them into snips formatted for SnipSnap to import. This involves some work to transform HTML markup into wiki markup, e.g.:

<xsl:template match="b|B|strong|STRONG|em|EM">__<xsl:apply-templates />__</xsl:template>

and some of this will depend on the way that the entries were marked up to begin with.

SnipSnap labels the snip corresponding to a blog entry with the yyyy-MM-dd date and parent snip space, so the entry fragment template looks like:

<xsl:template match="/">
  <snip>
    <name><xsl:call-template name="extract-date">
      <xsl:with-param name="doc" select="/" />
    </xsl:call-template></name>
    <content><xsl:apply-templates select="/table/string[@name='title']" />
    <xsl:apply-templates
        select="/table/content" /></content>
    <cTime></cTime>
    <mTime></mTime>
    <cUser>prb</cUser>
    <mUser>prb</mUser>
    <parentSnip>start</parentSnip>
    <backLinks></backLinks>
    <snipLinks></snipLinks>
    <labels></labels>
    <attachments></attachments>
    <viewCount>0</viewCount>
    <permissions>Edit:Owner</permissions>
  </snip>
</xsl:template>

<xsl:template match="string[@name='title']">1. <xsl:value-of
    select="@value" /> {anchor:<xsl:value-of 
    select="@value" />}</xsl:template>

Note that the "prb" should be replaced with your username.

The date extraction is a little bit painful:

<xsl:template name="extract-date">
  <xsl:variable name="datestr" 
      select="substring-after(string(/table/date[@name='when']/@value),', ')" />
  <xsl:variable name="day" select="substring-before($datestr,' ')" />
  <xsl:variable name="datestr2" select="substring-after($datestr,' ')" />
  <xsl:variable name="month">
    <xsl:choose>
      <xsl:when test="starts-with($datestr2,'Jan')">01</xsl:when>
      <xsl:when test="starts-with($datestr2,'Feb')">02</xsl:when>
      <xsl:when test="starts-with($datestr2,'Mar')">03</xsl:when>
      <xsl:when test="starts-with($datestr2,'Apr')">04</xsl:when>
      <xsl:when test="starts-with($datestr2,'May')">05</xsl:when>
      <xsl:when test="starts-with($datestr2,'Jun')">06</xsl:when>
      <xsl:when test="starts-with($datestr2,'Jul')">07</xsl:when>
      <xsl:when test="starts-with($datestr2,'Aug')">08</xsl:when>
      <xsl:when test="starts-with($datestr2,'Sep')">09</xsl:when>
      <xsl:when test="starts-with($datestr2,'Oct')">10</xsl:when>
      <xsl:when test="starts-with($datestr2,'Nov')">11</xsl:when>
      <xsl:when test="starts-with($datestr2,'Dec')">12</xsl:when>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="year" select="substring-before(substring-after($datestr2,' '),' ')" />
  <xsl:value-of select="$year" />-<xsl:value-of select="$month" />-<xsl:value-of select="$day" />
</xsl:template>

as is the replacement of any {, }, [, or ] characters in the posts:

<xsl:template name="escape-chars">
  <xsl:param name="txt" />
  <xsl:choose>
    <xsl:when test="contains($txt,'[')"><xsl:value-of
        select="substring-before($txt,'[')" />\[<xsl:call-template
        name="escape-chars">
          <xsl:with-param name="txt"
              select="substring-after($txt,'[')" />
        </xsl:call-template></xsl:when>
    ... And then similarly for the others ...
    <xsl:otherwise><xsl:value-of select="$txt" /></xsl:otherwise>
  </xsl:choose>
</xsl:template>

Once this is done, then all of the snips need to be grouped together:

<snipspace>
  ... snips go here ...
</snipspace>

and the snips with the same date (and thus the same name) need to have their content merged together.

That should do it! Just import the file through the management facility in SnipSnap, fix any errors that are there (which you'll have to do without any sort of debugging information), and it's done.

Addendum

I had to do some additional work with the data once I'd moved it into SnipSnap.

First, I had to do some additional search/replace to ensure that all permalinks (anchors) were valid names, which entailed removing commas, apostrophes, and other non-name characters. (The short summary would be that letters, numbers, periods, colons, and underscores are legal.)

Second, I modified the regular expression in the org.snipsnap.semanticweb.rss.Rssify (line 58 in 0.4.2a) that chunks the posts for RSS purposes so that it splits on top-level headers only instead of on every header:

pattern = compiler.compile(
  "^[[:space:]]*(1)[[:space:]]+(.*?)$",
  Perl5Compiler.MULTILINE_MASK);
(comment bubbles) 0 comments

Migrated to SnipSnap

Paul Brown @ 2003-10-04T00:00:00Z

After some experimentation, I've migrated my weblog from Radio to SnipSnap. I had been considering Roller, but the additional wiki functionality in SnipSnap and relative ease of import made the decision for me. (As of November 2002, the Roller team was planning on adding adding an import function for Radio weblogs but hasn't started on it yet.)

I've posted a How-To entry on Migrating Radio to SnipSnap.

After accidently corrupting the McKoiSQL database a few times with hard shutdowns on the server, I followed these instructions to convert the backend to MySQL. It was also necessary to do:

RENAME TABLE snip TO Snip;
RENAME TABLE snipuser TO SnipUser;

after dumping the database off my laptop to migrate to our hosted servers. (By the way, there may be a way to repair the McKoiSQL database, but I didn't try it...)

(comment bubbles) 0 comments

All Posts contains 398 items in 57 pages of 7 items each:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57