Splitting XML Well with XSLT 2

Paul R. Brown @ 2009-09-30T18:25:32Z

I recently had the need to split up a result set from a Solr query into a collection of smaller groups of add requests for POSTing into a different core. There are some ways to make the split work with text processing tools (split and friends), but it's always an open question whether an ad hoc approach will trip over some markup — it's just better to use XML tooling. By no coincidence (based on features missing from ), XSLT 2 makes it easy to do the right thing.

First up is grouping in chunks of 2000 records:

<xsl:for-each-group select="/response/result/doc"
                    group-by="round(position() div 2000)">
...
</xsl:for-each-group>

Outputting each hunk to a file named for the index of the group is also a one-liner:

<xsl:result-document href="{current-grouping-key()}_out.xml">
  <add>
    <xsl:for-each select="current-group()">
      <doc>
        <xsl:apply-templates />
      </doc>
    </xsl:for-each>
  </add>
</xsl:result-document>

And that's it. The only trick is choosing an XSLT  processor, and the superlative Saxon (from Saxonica) is my default choice.

(comment bubbles) 0 comments

Binding nxml-complete in Aquamacs Emacs

Paul R. Brown @ 2008-11-15T06:22:35Z

The folks who bundled up nXML mode for inclusion with Aquamacs Emacs did think to include the useful schemas bundled with nXML mode, e.g., for XSLT 1.0, but I can't figure out why they didn't bind nxml-complete (context-sensitive, schema-driven completion and suggestion) to a keystroke. (If memory serves, C-Return used to work.)

It's an easy fix, but I always forget just how to do it. As a reminder to myself or a hint to others, the fix is to add the following to ~/Library/Preferences/Aquamacs Emacs/:

(add-hook 'nxml-mode-hook
	  '(lambda ()
	     (define-key nxml-mode-map "\C-c\C-c"
	       'nxml-complete)))

My preference is for C-c C-c but yours may be different. In any case, that's about as blissful as editing XSLT gets.

(comment bubbles) 0 comments

Scripting for the Cloud

Paul R. Brown @ 2007-12-18T20:50:59Z

Paul Fremantle posted a brief analogy between classic UNIX pipelines and scripting for web services, and I posted a comment about some work going on with Ode that deserves a bit broader audience. Matthieu, Tammo, et al are working on a Javascripty language called SimPEL that maps to BPEL. They're making good progress, and the straw man examples are looking good, both for terseness and for legibility. It's still a way off, but it's on its way.

XML languages are known to be awful (hard on the hands, hard on the eyes, underspecified by document- or data-oriented schemata), and an orchestration DSL that maps onto BPEL has been on various wish lists for a while. (See, e.g., Brian's thought.) WSDL and XSLT should also be on the chopping block, and there are at least a few general approaches to abbreviating an XML syntax from which to draw inspiratiom. (See., e.g., RELAX NG compact syntax and Tom Moertel's PXSL.)

(comment bubbles) 0 comments