What good is an XML Hamlet?

Why bother with XML markup: making your data application independent.

Jorn and Sabren have been disputing the usefulness of XML on their sites and in the comp.lang.xml newsgroup.

It started when Scripting News pointed at an example of XML markup: Hamlet.

Jorn Says...

Back on Monday 24 August, Jorn said:

This XML version of Hamlet may look cool, but I think people will quickly realise it's utterly useless markup-for-the-sake-of-markup:

http://www.csclub.uwaterloo.ca/u/relander/XML/hamlet.xml [SN]

Sabren Says...

I disagree. With a file like this - and web style sheets - it becomes easy to format a custom hard copy of a script. Perhaps only your scenes, or perhaps with your lines highlighted. It would also be easy enough to make a rehearsal program to help actors rehearse.

Jorn Says...

And any hacker with any shred of professional pride should be embarrassed by the wastefulness of tagging every line with <LINE></LINE>. What are you guys thinking?!?

My Turn

Jorn: Exactly! Why would any hacker (in the classical sense of the word) code their own text when they can write a tool to do it for them. Remember Larry Wall's Three Cardinal Virtues of a Programmer? One of them is Lazyness. We write tools to encode text for us.

For example...

The structure of one of my "Web Log" entries is:

So it's easy to get an XML structure like:

<?xml version="1.0" encoding="IS0-8859-1" ?>
<ENTRY>
	<INTRO></INTRO>
	<URL></URL>
	<REMARK></REMARK>
</ENTRY>

Imagine a setup where I have a listener set up on my computer which waits for these XML objects, and then shoves these into a database, say Frontier.

Yes, there's already such a thing which does some of this without XML, the OCHA suite for Frontier. I use it for the Web Log. But its drawback is that I have to be at the computer it's running on in order to use it. I'd like to update my Web Log from anywhere: at lunch from the office, from my Palm Pilot, at an airport kiosk, or a friend's house.

I could build a Web form which takes the components and ships an XML file to my inbox where Eudora could dump it onto a staging area where Frontier looks for new files then imports it into the OCHA format. Then I can dump the HTML from inside of Frontier.

Through this whole example, the end user's never been exposed to raw XML. In fact I can write out the XML using the most perverse Studio Verso 'Killer Web Site' methodology I want. I can still provide the underlying XML for people who need the data for applications, rather than stroking the egos down in Marketing.

It's not a Strait Jacket

Note that I don't need a DTD for this object. It just needs to follow XML's well-formedness rules. The reason for that is so any XML aware application -- from a filter to a browser -- may read it.

Even if we don't share the same sets of tags, as long as my documents are well formed, I can create queries: show me every element whose name matches the regular expression "/name.*/i" and whose content matches the expression "/(^parker|^pose)/i".

This is the motivation behind tools such as the WIDL application proposed by WebMethods (http://www.webmethods.com/).

Ransoming Your Data

But what happens when Frontier's hit by a truck, Dave Winer tires of it and goes to Tahiti, or Microsoft buys it and makes it unusable?

Do you want to have all your data in Frontier, or in a format you can use with other tools? The BLOX suite, for instance, would let export all my data from Frontier as XML.

My company just upgraded everyone from Word 6 to Office97/98 across Macs and WinTel. We hired a small army of high school students and our IT staff fanned out across the corporate campus with walkie talkies to herd the kids and fix problems. They spent five days (and still have to fix problems.) The combined cost of licences and labor is staggering.

Make Your Data Get to Work

Taking advantage of XML's insistance on well-structured documents, I can manipulate them with simple chunks of code. That code could live on my ISP as well as my Mac. Then I can push it through an XSL stylesheet to produce HTML. The XSL stylesheet can do some programatic things such as only render a weeks worth of Web Log. In addition, I can produce other formats -- translating the file into a Channel Definition Format file you can subscribe to via Pointcast or IE. I could turn it into text and mail it to you.

I could add another parameter to the XML:

<? xml version="1.0" encoding="IS0-8859-1" ?>
<ENTRY>
	<SUBJECT VALUE=""/>
	<INTRO></INTRO>
	<URL></URL>
	<REMARK></REMARK>
</ENTRY>

Now I can have bibliographic tools manipulate the data. I know Jorn and Raphael have been categorizing the links they find and repurposing them in other parts of their sites. This could be done in an automated fashion if they have put their data in a rich XML format.

Now they could do that if they stored it in Oracle or mySQL, but XML is designed to handle this sort of structured data (imagine the SQL statements you'd need otherwise.)

There is a database type well suited to XML/HTML content, the object database systems sold by companies like Object Design, Poet and Objectivity. Pulling data out in XML (for transport/display) is trivial in these systems and at least two of these companies already have XML support in their products.