Page 1 of 1

DOMIndentor - Make your XML/HTML Neat!

Posted: Thu May 29, 2008 2:40 am
by Verminox
Just wrote this for an application I'm making so I thought I'd share / get opinions...

DOMIndentor is a class that takes a DOMDocument and gives it a nice indentation. It recognizes inline tags and does not break them up to new lines (eg. a <a> tag within your paragraph will not be newlined/indented). See the example below...

DOMIndentor.php

Code: Select all

<?php
/**
 * Formats an XML DOMDocument with neat indentation
 */
class DOMIndentor
{
    /**
     * Level of indentation
     */
    private $indent;
    
    /*
     * The DOMDocument to indent
     */
    private $document;
    
    /*
     * Formats the DOMDocument with neat indentation
     */
    public function indent($document)
    {
        // Prepare the document
        $this->document = $document;
        // Get the root node
        $rootNode = $this->document->documentElement;
        // First strip all whitespace text nodes
        $this->stripWhitespace($rootNode);
        // Initialize indent level
        $this->indent = 0;
        // Indent all nodes
        $this->indentNode($rootNode);
    }
    
    /*
     * Strips text nodes that only contain whitespace
     */
    private function stripWhitespace($node)
    {
        // Make sure node is not a leafe node
        if($node->hasChildNodes())
        {
            // Iterate through children
            for($i=0; $i<$node->childNodes->length;$i++)
            {
                $childNode = $node->childNodes->item($i);
                // If whitespace node found, remove it
                if($childNode->nodeType == XML_TEXT_NODE)
                {
                    if(trim($childNode->nodeValue) == '')
                    {
                        $node->removeChild($childNode);
                        $i--;
                    }
                }
                // Recurse
                else
                {
                    $this->stripWhitespace($childNode);
                }
            }
        }
    }
    
    /**
     * Provide indentation to a DOMNode
     */
    private function indentNode($node)
    {
        // Make sure it is not a leafe node
        if($node->hasChildNodes())
        {
            // Count number of text nodes as children
            $textNodes = 0;
            foreach($node->childNodes as $childNode)
            {
                if($childNode->nodeType == XML_TEXT_NODE)
                {
                    $textNodes++;
                }
            }
            // If there are any child text nodes, don't recurse because everything inside is considered inline
            if($textNodes==0)
            {
                // Increase level of indentation
                $this->indent++;
                // Add newline and indent
                $before = "\n";
                $before .= str_repeat("\t",$this->indent);
                foreach($node->childNodes as $childNode)
                {
                    // Insert indentation before the node
                    $node->insertBefore(new DOMText($before), $childNode);                  
                    // Recursive
                    $this->indentNode($childNode);
                }               
                // Decrease level of indentation
                $this->indent--;
                // Add newline and closing indent
                $after = "\n";
                $after .= str_repeat("\t",$this->indent);
                $node->appendChild(new DOMText($after));
            }
        }
    }
}
 
$xml = new DOMDocument();
$xml->load('example.html');
$indentor = new DOMIndentor();
$indentor->indent($xml);
echo $xml->saveXML();
?>
Example.php:

Code: Select all

<html>
<head><title>This is a title</title></head>
<body>
<ul><li>A list item</li><li><em>Another</em> List item</li><li><ul><li><strong>Nested</strong> List Item</li><li>Again?</li></ul></li></ul>
<table>
<tr><th>Name</th><td>Bob the <strong>Man</strong></td></tr>
<tr><th>Foo</th><td>Bar</td></tr>
</table>
</body>
</html>
Output:

Code: Select all

<?xml version="1.0"?>
<html>
    <head>
        <title>This is a title</title>
    </head>
    <body>
        <ul>
            <li>A list item</li>
            <li><em>Another</em> List item</li>
            <li>
                <ul>
                    <li><strong>Nested</strong> List Item</li>
                    <li>Again?</li>
                </ul>
            </li>
        </ul>
        <table>
            <tr>
                <th>Name</th>
                <td>Bob the <strong>Man</strong></td>
            </tr>
            <tr>
                <th>Foo</th>
                <td>Bar</td>
            </tr>
        </table>
    </body>
</html>

The only problem with this is that you can only consider the file/source as XML, and not HTML, because if you call DOMDocument::loadHTML() or DOMDocument::saveHTML() it adds it's own weird whitespace and disturbs this script.

I'm no PHP or XML Wizard, so critique is welcome...

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Thu May 29, 2008 6:04 am
by Eran
Whats the point of wasting CPU cycles to indent HTML through PHP? Just do it by hand... it will also make the source more readable

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Thu May 29, 2008 7:13 am
by Verminox
pytrin wrote:Whats the point of wasting CPU cycles to indent HTML through PHP? Just do it by hand... it will also make the source more readable
What's the point of doing anything in PHP if it could be done by hand in the same amount of time, maybe thousands of times, without any hassle? ;)

Well, you might say that PHP Is dynamic and even if you were superhuman who could perform tasks at the speed of a computer, you probably can't sit at a webserver everyday and respond to different requests having variable paramaters. You might just be right. :P

And that's just when these little apparently pointless pieces of code come into use. To turn some random request into a desirable output.

See Firefox for example. Create an XML file without any indentation (or too MUCH indentation) and open it in Firefox, it will be displayed neatly anyway... This is the equivalent of it in PHP. :) I'm using it to indentify markup coming from user posts and a WYSIWYG editor.

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Thu Jun 19, 2008 5:36 pm
by nowaydown1
I think that's pretty slick personally. Nice job. :o

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Thu Jun 19, 2008 5:42 pm
by Eran
Verminox wrote: I'm using it to indentify markup coming from user posts and a WYSIWYG editor.
Well this is point actually, somehow missed this reply. I thought you might be using this class to avoid indenting your HTML views by hand...

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Fri Jun 20, 2008 1:17 pm
by JAB Creations
It's cool though I use single space indents instead of an entire tab. It drives a lot of other developers crazy but when you have a dozen or so tab spaces it drives me crazy. If I were to use a script like this (though I indent everything myself by hand in my setup) I'd appreciate a variable at the beginning where I could easily define how much of an indent or type of an indent I could have the script generate. Regardless of this missing and I suppose subjective feature great job! :mrgreen:

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Wed Jul 02, 2008 4:12 pm
by John Cartwright
Thanks for sharing!

Re: DOMIndentor - Make your XML/HTML Neat!

Posted: Tue Jul 15, 2008 5:38 am
by alex.barylski
Sweet. I coudl use this when users elect to edit HTML by hand. I hate WYSIWYG mangled code. Ugh...