Using a Glossary to Unwind Comments from Links

Automating WebLogs that are more than a list of links presents a challenge when representing them in XML. One way to solve the problem is to unentangle links from narrative in the XML representation.

I got a note from Matt Haughey this evening:

"I'd like to wrap up my MetaFilter (metafilter.com) weblog in XML for syndicated version a few people have been asking about. Your DTD looks pretty complete. But I often have several links per entry, what would you suggest for that? A new element called "extralink" or something like that?"

The problem Haughey's describing is an entry like this:

Sunday, November 14, 1999

Accipe sacrificium confessionum mearum de manu linguae meae, quam formasti et excitasti, ut confiteatur nomini tuo, et sana omnia ossa mea, et dicant: domine, quis similis tibi? neque enim docet te, quid in se agatur, qui tibi confitetur; quia oculum tuum non excludit cor clausum, nec manum tuam repellit duritia hominum: sed solvis eam, cum voles, aut miserans aut vindicans, et non est qui se abscondat a calore tuo.*

There are multiple links within what might be construed as a single entry. How should we represent this in XML?


<?xml version="1.0"?>
<weblog>
	<entry>
		<date>1999-11-14</date>
		<title>lorem ipsum</title>
		<url>http://www.foo.bar/baz<;/url>
		<description>Accipe sacrificium confessionum mearum de manu
linguae meae, quam formasti et excitasti, ut confiteatur nomini tuo, et sana
omnia ossa mea, et dicant: domine, quis similis tibi? neque enim docet te, quid
in se agatur, qui tibi confitetur; <a href="http://sim.sala.bim/">quia
oculum tuum</a> non excludit cor clausum, nec manum tuam repellit duritia
hominum: sed solvis eam, cum voles, aut miserans aut vindicans, et non est qui
se abscondat a calore tuo.</description>
		<linktext>ut confiteatur nomini tuo</linktext>
	</entry>
	...
</weblog>

This is what I currently do. I don't like this because my XML format is non-standard and I'm embedding a link inside of the description and losing that information.

I'd rather use a standard format such as RSS (which I'm already doing for syndication,) but RSS is not amenable to the writerly style of comments many WebLogs use. Instead of adding to an existing DTD, the right approach may be to split the WebLog into two XML documents: one with the narrative, another with the links.

Example

Think of the WebLog as two documents: a narrative, and a list of links.

Narrative

This is the narration. I would edit this by hand, and embed links in here as I need them.

Sunday, November 14, 1999

Accipe sacrificium confessionum mearum de manu linguae meae, quam formasti et excitasti, ut confiteatur nomini tuo [ link one ], et sana omnia ossa mea, et dicant: domine, quis similis tibi [ link two ]? neque enim docet te, quid in se agatur, qui tibi confitetur; quia oculum tuum non excludit cor clausum, nec manum tuam repellit duritia hominum: sed solvis eam, cum voles, aut miserans aut vindicans, et non est qui se abscondat a calore tuo.

Friday, November 12, 1999

Eant et fugiant a te inquieti iniqui [ link three ]. et tu vides eos et distinguis umbras, et ecce pulchra sunt cum eis omnia, et ipsi turpes sunt. et quid nocuerunt tibi? aut in quo imperium tuum dehonestaverunt, a caelis usque in novissima iustum et integrum? quo enim fugerunt, cum fugerent a facie tua? aut ubi tu non invenis eos? sed fugerunt, ut non viderent te videntem se, atque excaecati in te offenderent -- quia non deseris aliquid eorum, quae fecisti -- in te offenderent iniusti et iuste vexarentur, subtrahentes se lenitati tuae, et offendentes in rectitudinem tuam, et cadentes in asperitatem tuam. videlicet nesciunt, quod ubique sis, quem nullus circuminscribit locus, et solus es praesens etiam his, qui longe fiunt a te. convertantur ergo et quaerant te, quia non, sicut ipsi deseruerunt creatorem suum, ita tu deseruisti creaturam tuam.

Linkage

These are the links themselves, I'm editing these through a Web interface.

Link One
Sed tamen sine me loqui apud misericordiam tuam, me terram et cinerem,sine tamen loqui, quoniam ecce misericordia tua est, non homo, inrisormeus, cui loquor.

Link Two
Deus, deus meus, quas ibi miserias expertus sum et ludificationes, quandoquidem recte mihi vivere puero id proponebatur, obtemperare monentibus, ut in hoc saeculo florerem, et excellerem linguosis artibus, ad honorem hominum et falsas divitias famulantibus. inde in scholam datus sum, ut discerem litteras, in quibus quid utilitatis esset ignorabam miser. et tamen, si segnis in discendo essem, vapulabam.

Link Three
laudabatur enim hoc a maioribus, et multi ante nos vitam istam agentes praestruxerant aerumnosas vias, per quas transire cogebamur multiplicato labore et dolore filiis Adam.

Link Four
Et tamen peccabam, domine deus meus, ordinator et creator rerum omnium naturalium, peccatorum autem tantum ordinator, domine deus meus, peccabam faciendo contra praecepta parentum et magistrorum illorum.

To join the two documents, we can use the Frontier concept of a glossary. The catalog of links in the WebLog's content manager is the glossary. To jump to the resource specified in the glossary, use an indirection.

Suppose Link One above is id 749 in the link database. Then in the narrative, we could represent the link as: <a href="redirect.php3?id=749">. The redirect.php3 script takes id=749 as an argument, looks up the corresponding URL in the database and redirects the browser to that location.

The XML version of the narrative could be:


<?xml version="1.0"?>
<weblog>
<entry>
	<date>1999-11-14</date>
	<text><p>Accipe sacrificium confessionum mearum de manu linguae meae, quam
formasti et excitasti, <link id="749">ut confiteatur nomini tuo</link>, et sana omnia ossa mea, et
dicant: domine, <link id="750">quis similis tibi</link>? neque enim docet te, quid in se agatur, qui
tibi confitetur; quia oculum tuum non excludit cor clausum, nec manum tuam
repellit duritia hominum: sed solvis eam, cum voles, aut miserans aut
vindicans, et non est qui se abscondat a calore tuo.</p></text>
<entry>
...
</weblog>

The content management system would parse the link element and construct the elements <a href="redirect.php3?id=749">...</a> and <a href="redirect.php3?id=750">...</a> when rendering as HTML.

A separate interface could produce the links in an RSS file independent of the WebLog's XML representation.

If you weren't concerned with the XML presentation of the narrative, you could use a tool like Blogger to manage the narrative, and include it into the weblog's page as a server-side include. You would be responsible for hand-coding the redirect.php3 references.

The intermingling of links and narrative makes a simple XML representation of a WebLog's content difficult. I've simplified the problem by refactoring the WebLog's XML representation as two documents. I'll be experimenting with this approach in future WebLog automation.

© 1999, Bill Humphries


Thanks to LemonYellow for the idea of using "Augustine's Confessions" instead of lorem ipsum for filler text.