What’s µF got on XML?

Decisions, decisions. Therein lies the reason why convention over configuration is so bloody great, precisely because it requires fewer decisions. Reap, my humble build tool, outputs log files. Yea, logs are good. But the decision I have to make is, “in what format?”

Now old-school would just put out some asterisk-studded text files with plenty of precise timestamps, maybe some nice long ------------------------------ lines to divide things up. No doubt these files are easy to access and fairly readable. We like out text files. But hey wait. This is the age of magical markups. At the very least we can coax that log into a Wiki-tongue. For us Rubyists, we have rdoc, markdown and textile’s just a few gemy fingertips away. These formats can be even easier to read thanks to structural consistency from log to log. Plus, fire-up a web server and we can get some really nice looking HTML too. That’s the beauty of Wiki Wiki markup after all.

Now if beauty were all that matter, then there would be no point in looking further. But like America’s continental shelves there is untapped wealth going to waste here. There is data in them there logs! The modern buzzword is semantic. Taking logs to the next level requires us to ensure their semantic value. On that account, XML was created, and long predicted the format of the future. Certainly XML has made great strides, but it is still far from meeting its promise. There is simply too much complexity involved in both marking up the content and in generating nice output.

Then comes along the microformat. The microformat combines the semantic capabilities of XML with the layout capabilities of HTML. Microformats are relatively new and still finding their footing. Indeed, I think a Universal Uniform Microformat Specification is ultimately the necessary outcome. In that vain I make a first rough estimate of what such unifying system might look like, and why, contrary to many arguments otherwise, microformats do in fact bring more to the table than XML.

Where XML provides semantic information within a doubly linked hierarchy (via attributes and body), microformats provide the same plus a set of well defined data structures. Microformats provide semantics through a small set of tag attributes, role, rel, rev and primarily the class attribute. The class attribute alone is enough to make microformats fully equatable to XML. We can easily map the same data set in either format:

  
    
      John Jay
    
  

as opposed to

  
John Jay

Although the microformat is more verbose, the two contian the same original semantic value. But the extra verbosity isn’t just a waste of UTF bytecodes, it says something. Specifically it gives us a semantic structure. It just so happens that HTML evolved to offer these structures precisely because that is what data presenters require to do their jobs well. And for the same reason, why you are always being beat with the div-stick when you try to take the load road of laying out your pages with tables.

So what semantic structures do micorformats via HTML make available to us? The quick rundown: we have layouts using div and span, lists using ol, ul and li; definition lists using dl, dd and dt; and tables using table, tr, th and td. Those are the obvious structures. There are still other smaller structure’s like a for links, and the hefty set of form elements. All these elements provide us a way to say what type of thing our data is or partakes in, not just what the data is.

Clearly, the creators of XML saw the need for something like this and tried to achieve it through XML Namespaces. But namespaces are less effective because they are completely arbitrary, whereas HTML gives us a limited but universal structural language. Seems to me there is a principle to be found here that can be a guide for both the future of HTML and Microformats.