More on XSLT and PHP

Michal Wallace set up the new "Sablotron" libraries in PHP running on his "Cornerhost" service. This gave me a chance to experiment with the new syntax and put together a "Cocoon-style" XML handler.

Under development: think about security holes and tainted GET/POST parameters before using this in production.

At the same time, I learned about "Krysalis", which implements the Cocoon II approach in PHP. So if you want a more complete widget....

PHP's latest XSLT handler

The major improvement in PHP's XSLT support is that you can pass it file names instead of having to read files into strings to pass to the xslt_process function:

$xh = xslt_create();
$result = xslt_process($xh,"source.xml","style.xsl",NULL);
if ($result)
{
 print $result;
} else {
 print "Error: ".xslt_error($xh);
}

The "Direct Approach" in PHP

In an earlier talk I described what I called the "direct method" set of applications for transforming XML with XSLT. These are server side applications such as Cocoon, AXKit, and the XSLT processor module for IIS which automaticaly intercept requests for XML documents, determine the appropriate XSLT transform to apply, transform the requested XML document, and return the transformed document.

I want to do the same thing using PHP. So to do that:

  1. I need to know that I'm processing an XML document.
  2. And I need to know what stylesheet to apply.

I'm going to let Apache handle item one. It's trivial to set up mod_rewrite to catch requests for XML documents and route them to a PHP handler.

There's two standards for doing the second item. The most common is to specify a style sheet in the XML document. An XML processing instruction of the form: <?xml-stylesheet type="text/xsl" href="somestylesheet.xsl"?> is the canonical method for associating an XSLT style sheet with a document. All the server-side direct translation approaches use this 'hook.'

http://web3.w3.org/TR/xml-stylesheet/

Interface

Let's specify the URL interface to the handler as:

http://some.server/xslt/handler.php?source=/path/to/document.xml

We can use mod_rewrite to disquise that as:

http://some.server/path/to/document.xml

Reading which XSLT style sheet to use

Since the name of the style sheet to use is stored in an XML processing instruction, we need to use PHP's XML parser to extract it.

$xp = xml_parser_create();
xml_set_processing_instruction_handler($xp,"pi_handler");
if (!($fp = fopen($source,"r")))
{
  die ("Could not read XML file: ".$source);
}
while ($data = fread($fp,4096) and !($stylesheetFound))
{
  if (!xml_parse($xp,$data,feof($fp)))
  {
    die(sprintf("XML parsing error: %s at line %d",
       xml_error_string(xml_get_error_code($xp)),
       xml_get_current_line_number($xp)));
  }
}

Since we're only interested in the xsl-stylesheet processing instruction, we just define that callback, and abandon parsing after finding it.

The callback takes three arguements, the handle of the parser, the target: xml-stylesheet, and the processing instruction data: type="text/xsl" href="somestylesheet.xsl". Parse the later to get the name of the stylesheet.

Define the callback function pi_handler() as:

function pi_handler($xp,$target,$data)
{
  if ($target == "xml-stylesheet")
  {
    $regs = array();
    ereg("href=\"([a-zA-Z0-9_.]+)"",$data,$regs);
    $GLOBALS["stylesheet"] = $regs[1];
    $GLOBALS["stylesheetFound"] = true;
  }
}

Document location

So you don't have to store your XML documents in the Apache server's document path. That's useful if you don't want users reading the raw XML. You can take the path in $source and construct a path to another location on the server from which Apache can read.

Or could you could just pair the value of $source and $HTTP_SERVER_VARS["DOCUMENT_ROOT"] to locate the document to transform.

Sitemaps

To be added.

Passing Parameters

To be added.