Gradual Epiphany

XML C14N - Grumblings

Reading through the XML C14N specification this evening, I encountered a rather strange requirement in section 2.1:

The first parameter of input to the XML canonicalization method is either an XPath node-set or an octet stream containing a well-formed XML document. Implementations MUST support the octet stream input and SHOULD also support the document subset feature via node-set input.

What is a specification doing dictating the types of parameters implementations should implement?! Further on in that same section, the spec talks about how the “second” parameters should be a flag indicating if comments should be included, etc.

It’s exactly this sort of cruft that makes W3C specs so hard to read. Specifications are hard enough to read without details of implementation (particularly at the parameter level) clogging up the spec. The other thing that makes them hard to read is this seeming obsession with mathematical provability. Who cares if the algorithm is mathematically proveable?! We’re talking about basic information processing here, not algorithmic analysis…

The other aspect of C14N defined by W3C that I don’t get is why default attribute values must be included. They explicitly say that you shouldn’t need the DTD/Schema to canonicalize the document, but then require one to know what the default attribute values are — how in the world does one implement that?

Another thing that bugs me about the C14N spec is that the entire process is defined in terms of XPath. I don’t quite see the value they derived from forcing this dependency. If I had to guess, I would say this spec was written around a very specific implementation. What about people who don’t use XPaths (maybe for performance reasons)? Implementing this spec requires the reader to then figure out what a given XPath term means.