Populating a Java Object Model from XML

Paul Brown @ 2006-02-05T01:02:00Z

This post describes an approach to populating a Java object model from an XML document. It's an approach that I came up with when working on a particular parsing problem.

Updates: A couple of people have mentioned XStream and XMLBeans, but those fail my tests below. XStream is a serialization tool (as its docs say), and XMLBeans was chubby in terms of the size of the libraries. For what it's worth, if I were willing to suffer a large dependency, XMLBeans version 2 looks pretty good in that it provides a token-oriented interface and location information (via XmlLineNumber).

A closet full of clothes, and not a thing to wear...?

My self-imposed requirements were as follows:

  • Populate a pre-existing Java object model from SAX events.
  • Support multiple XML dialects mapping directly to a single object model.
  • Both the XML dialects and the object model are specified a priori.
  • Impose zero additional dependencies beyond SAX; ideally, the implementation will be just a fancy ContentHandler.
  • Expose SAX location information from the parse (e.g., line and column) to the target object model during construction.
  • Expose namespace context from the parse so that expressions like QNames and XPaths in attribute values can be properly post-processed.
  • Use programmatic configuration, not properties files or XML or annotations to the schema.

For my particular application, the XML documents would be BPEL processes in either flavor (1.1 or 2.0), and the target object model would be PXE's BPEL Object Model or “BOM”. There were additional requirements around handling extensions, but those aren't directly relevant to the approach that I settled on.

Now, surely someone else has had the same or a similar set of requirements, more or less the same sensibilities, and the altruism to post it as open source...

Of the various approaches to XML binding (the bindmark project has a good list), I didn't find any that fit the requirements. Many tools generate an object model from a schema, and while the generated models don't usually meet my taste for API ergonomics (JAXB 1.0 had a particularly rank code smell when applied to the BPEL4WS 1.1 schema...), that would be one way to go if additional dependencies were acceptable. (The idea would be to use the generated object models as data transfer objects and then maintain multiple mappings onto the internal object model as domain objects.) JiBX looked particularly interesting, but it requires XPP3 and uses bytecode enhancement, which would rule out the simultaneous support for multiple XML dialects without intermediate object models. Digester has approximately the right flavor, but the target object model wasn't particularly JavaBean-ish and location information wasn't exposed.

One of the flaws in schema-driven bindings is that XML schema rarely (if ever) encapsulates all of the semantics of the XML language that it can be used to (loosely) validate, so automated or generated bindings do at most a partial job.

The Idea and Outcomes

So I came up with a different approach. The basic idea was to construct a graph of event consumers that closely resembles the grammar for the XML document and use SAX events to walk the graph. Each edge of the graph is decorated with a function that accepts a single SAX event and returns true or false, e.g., a QName with or without an attribute mask, or a non-whitespace characters event. The edges incident to a vertex are ordered, and events are matched (or not) according to the ordering.

From another perspective, this uses the XML parser like a lexer and the graph like a parser.

From yet another perspective, the idea is rather like Haskell's pattern matching, in which case the whole thing could be looked at as a collection of functions that accept a list of SAX events and return an object. Each function consumes the head event from the list, selects another function to pass the tail of the list to, and adds the result of the call to the current object. (The presumption is that objects know how to add various kinds of children or metadata to themselves.) Of course, Haskell wasn't an option. (And of the two Jaskells, Jaskell has a few too many moving parts in the toolchain for my taste, and Jaskell doesn't have pattern matching.)

My first-pass implementation in Java (PXE's bpel-parser module) did the job nicely but wasn't quite as pretty at the code level as I might have liked, as it required a a good amount of boilerplate. That said, and in-line with the lexer/parser observation above, the boilerplate and transition set could easily be generated from a RELAX NG grammar.

Considering that JAXB 2 looks slick, has a non-regressive license, will be part of both Java EE 5 and Java SE 6, and supports passing through some XML fragments in raw form, it would be a difficult call if I faced the same problem at present. (Like JAXB 1, JAXB 2 doesn't expose location information, but location information can be added to the XML document as content using some SAX tricks, but that's a hack.)That said, the need to support semantics beyond those present in the schema might very well drive me down the same path again.

Meta

Tags: (tag) (tag) (tag) (tag)

(comment bubbles) 6 comments
2801 direct views

Comment from Ben @ 2006-02-05T12:02:13Z # permalink

xstream.codehaus.org?

Comment from Anon @ 2006-02-05T16:38:25Z # permalink

xmlbeans.apache.org?

Comment from alexey @ 2006-02-05T21:22:49Z # permalink

Some of the requirements listed were exactly the reasons why I started a new binding tool myself sometime ago. It's currently called jbossxb. The first release is scheduled for this spring. It's also going to implement JAXB2.0 but not in the first release. There actually already is a JAXB2.0 wrapper that is used by JBossWS currently. If you are interested in development or just discussing ideas/requirements you are welcome on the forums http://www.jboss.com/index.html?module=bb&op=viewforum&f=212 http://wiki.jboss.org/wiki/Wiki.jsp?page=JBossXB

Comment from Alessandro Vernet @ 2006-02-06T07:48:31Z # permalink

Hi Paul, If you are interested in functional languages and in particular in pattern matching, have a look at Scala (http://scala.epfl.ch/). Is is an actively developped functional and object oriented language that does pattern matching and plays well with XML. Scala is compiled to Java byte code, so your Scala code plays well with Java libraries other Java code you might have. Alex -- Blog (XML, Web apps, Open Source): http://www.orbeon.com/blog/

Comment from Paul Brown @ 2006-02-06T07:52:26Z # permalink

Hi, Alex. I've been watching Scala, and version 2 looks like fun. I've even heard rumors of a non-thread-bound Actor implementation coming. — Paul

Comment from Santhosh Kumar T @ 2010-01-23T10:54:22Z # permalink

try http://code.google.com/p/jlibs/wiki/SAX2JavaBinding