Tim Bray wrote a post about unnecessary reinvention in XML languages, arguing that the use cases are more or less covered by the “big five”: XHTML, DocBook, OpenDocument (so, presumedly also Dublin Core, MathML, SVG, and SMIL by reference), UBL, and Atom. (I don't recall his take on the MySDL meme, e.g., NSDL.) Uche Ogbuji takes a slightly looser stance and makes the excellent point that RELAX NG plus Schematron can get pretty darn good (but still not great) portable validation in excess of what a DTD or («gag» «cough») XML Schema would provide.
That said, why create an XML language at all, particularly if humans will directly create, modify, and consume the documents? How will documents be created? What is their purpose? Who will consume them? In what form? What is the difference between a schema-valid document and a correct one? (That is, can all of the constraints be expressed by the schema?) Of course, before I throw too many stones, if I may paraphrase Barabas (and Stallings), I have created an XML language, but that was in another country; and besides, no one else knew better at the time. It might be just as much trouble to specify a non-XML language that's easier for humans to compose and avoids the various pitfalls of correctly processing XML with generally available tooling, to say nothing of versioning, differencing, or patching.
Take the RELAX NG compact syntax as a case in point for creating non-XML languages instead of XML languages. The difference between
<element name=“addressBook” xmlns=“http://relaxng.org/ns/structure/1.0”>
<zeroOrMore>
<element name=“card”>
<choice>
<attribute name=“name”/>
<group>
<attribute name=“givenName”/>
<attribute name=“familyName”/>
</group>
</choice>
<attribute name=“email”/>
</element>
</zeroOrMore>
</element>
and
element addressBook {
element card {
(attribute name { text }
| (attribute givenName { text },
attribute familyName { text })),
attribute email { text }
}*
}
should be obvious, and coding a few grammars in the compact and verbose formats will probably convince most people of the utility of the compact representation. This is purely subjective, of course, but it is precisely subjective utility and aesthetics that I'm arguing.
On the same topic, BPEL4WS 1.1 and WS-BPEL 2.0 are examples of one XML language sin committed and one in progress. Both are programming languages, and both have an XML syntax that's just painful to type, even with a decent XML editor, and impossible to get right without additional sugar on top to ensure that namespace prefixes and WSDL component names used in expressions tie-back properly. (Worse, some folks use lossy visual “editors”. It's one thing to use a design tool that translates a specific visual representation of a process into BPEL, but it's quite another to try to do fine-grained visual editing of BPEL at the detail level.) Why not something along the lines of what Brian McAllister proposed, i.e., something that looks and feels like a programming language? It would be straightforward to tie an ANTLR grammar into PXE's compiler pipeline in place of an XML parser...