← Home

A Php userland Dom Contest

The competitors

After a quick search with Google we've come up with 5 implementations that promised to provide the behaviour we need and don't rely on phps native dom/xml support.

Here they are (including our first impression of the code):

  • ActiveLink - a nice implementation with very clean code, having (at least for an xml dom) a somewhat unusual tree api that differentiates between "leafs" and "branches".
  • DomIt - a promising, feature-rich and well documented implementation
  • MyXml - a threefold library (containing myDom, myXPath and myXslt), at the first glimpse it looks mostly compliant with php's own dom api
  • MiniXml - mentioned somewhere in the php Dom manual, seems to be a php port of Cpans XML_Mini perl implementation (or probably a parallel branch).
  • PhpDomXml - a very small and basic implementation, 13kb are all we need?

All of these are for php4 :(, as far we can see. So we turn off error_reporting for E_NOTICE and E_STRICT to run it under php5.

Test Code

We've used the following code to test the native php5 DomDocument version. That's a very simple html snippet, an iteration over creating, parsing and outputting it ten times and a simple output of the results.

<?php
error_reporting(E_ALL ^ E_NOTICE);
$iterations = 10;
$start = getMicrotime();
$source =
'<html>
        <style>
                body { font: 12px normal verdana, arial, serif }
                h2.test { color: #c00000; }
        </style>
        <title>Let's check some dom xml implementations</title>
        <body>
                <h2 class="test">Let's check some
                        dom xml implementations</h2>
                <p><b>This is the native php5 DomDocument
                        implementation.</b></p>
                <p>What about the <i>execution time</i>?</p>
        </body>
</html>';
for ($i = 0; $i < $iterations; $i++) {
        $doc = new DomDocument();
        $doc->loadXML($source);
        $result = $doc->saveXml();
}
echo '<pre>' . htmlentities($result) . '</pre>';
echo $result;
$time = (getMicrotime() - $start);
echo "$iterations Iterations done.<br/>";
echo "We needed $time seconds.<br/>";
echo "The average execution time was: " . ($time/$iterations);
function getMicrotime () {
        $microtime = explode(" ", microtime());
        return $microtime[0] + $microtime[1];
}
?>

We like it our own way ...

Of course, for every php implementation we needed to include the libraries and use some different function calls because they all have their own api.

ActiveLink seems to want the following:

$doc = new XMLDocument();
$doc->parseFromString($source);
$result = $doc->getXMLString(true);
But DomIt prefers:
$doc = new DOMIT_Document();
$doc->parseXML($source);
$result = $doc->toNormalizedString();

Different from that, MyXml will be happy with:

$doc = new Document();
$doc->parse($source);
$doc->setOption('indent', true);
$result = $doc->toString();

And MiniXml likes to get called with:

$doc = new MiniXMLDoc();
$doc->fromString($source);
$result = $doc->toString();

Not enough, for PhpDomXml we have to use:

$doc = new XML();
$doc->parseXML($source);
$result = $doc->toString();

Funny, hm? There's not one api really aligned to php's own domxml api.

Ok, now for the interesting part ;)

The results - by speed ...

This ran on a WindowsXP, Apache2, php5.0.3 system on a somewhat outdated, but usable 1800+ Athlon.

avg exec time output format comment
php5 Dom 0.0001s great fastest, of course
PhpDomXml 0.0022s awful missing text!
ActiveLink 0.0061s not the best strange api
MyXml 0.0088s looks ok?
DomIt 0.0140s ok
MiniXml 0.5075s extra spaces!

Native php DomDocument - fastest, of course

Here are some screenshots to show the output formatting. php5 DomDocument, unbeated in terms of performance as well as output formating (just included to have the relation), looks like this:

screenshot 1

PhpDomXml - fast and unusable

Next, in terms of speed, comes the very lean and feature-poor PhpDomXml. But besides the lacking output format it seems to be unusable - eating up some cdata text from the html:

screenshot 2

ActiveLink - fast and strange

ActiveLink is fast and has an output format that's probably acceptable. But - as mentioned - ActiveLink uses a strange tree/branches/leaf api. We can't call anything like getChildren() or first/nextChild because of that.

screenshot 3

MyXml - ok to go?

MyXml adds some extra whitespace to the second p tag. That's bad for a template engine because of design problems with IEs whitespace handling. Besides that, it looks very nice and still is comparably fast (enough?):

screenshot 4

DomIt - somewhat slow, but good

With DomIt, there's an extra indentation in the styles section, but no extra whitespace inside the tags:

screenshot 5

MiniXml - unusably slow

Argh! What's that?

MiniXml needs half a second to parse 11 tags, format and print them out again?

We can't believe that ... probably there's some extra option turned on by default, so that our drive c:/ get's crawled looking for a tmp directory? Or probably an e-mail send to the developers? Added some rating to their freshmeat project? ;)

We didn't have the time to investigate further what's going on here.

But MiniXml (like MyXml, but even worse) does something very bad besides eating up our cpu. It adds an extra space to the left and right of each and every cdata content of our tags. A webdesigner who knows about IEs whitespace bug will strongly insist that this would be a clear no-go sign for using it with html templating stuff ...

screenshot 6

Summary

Without having had a closer look to PhpDomXml's features, it won't make it with its hunger for our cdata text.

ActiveLink is fast but not really an option (is it?) because of its api - unless we'd create an extra wrapper to get around that.

MyXml would be the one to go with, if there wouldn't be that extra whitespace. Probably there's a switch to turn that off? Or an e-mail to the developers could make it?

DomIt looks great, clean, and feature-complete. But it's somewhat too slow to be used in a template engine. (Is it?)

We can't believe that we've seen MiniXmls intended behaviour for now. But with the shortness of time, there hasn't been an opportunity to get it run faster.

php5's DomDocument

ActiveLink, the probably fastest (in our case) usable library will need 0.0061s to parse and output a template. That's 60 times slower than the native php5's Dom implementation, which takes 0.0001s for the same job - of course it's faster, it's written in C. Furthermore, php5's Dom is the only one of all of them that generates output exactly the way we'd expect it.

But sad to say, we can't use it for our case. It can't be un/serialize()d