DOMIndentor - Make your XML/HTML Neat!

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

Post Reply
User avatar
Verminox
Forum Contributor
Posts: 101
Joined: Sun May 07, 2006 5:19 am

DOMIndentor - Make your XML/HTML Neat!

Post by Verminox »

Just wrote this for an application I'm making so I thought I'd share / get opinions...

DOMIndentor is a class that takes a DOMDocument and gives it a nice indentation. It recognizes inline tags and does not break them up to new lines (eg. a <a> tag within your paragraph will not be newlined/indented). See the example below...

DOMIndentor.php

Code: Select all

<?php
/**
 * Formats an XML DOMDocument with neat indentation
 */
class DOMIndentor
{
    /**
     * Level of indentation
     */
    private $indent;
    
    /*
     * The DOMDocument to indent
     */
    private $document;
    
    /*
     * Formats the DOMDocument with neat indentation
     */
    public function indent($document)
    {
        // Prepare the document
        $this->document = $document;
        // Get the root node
        $rootNode = $this->document->documentElement;
        // First strip all whitespace text nodes
        $this->stripWhitespace($rootNode);
        // Initialize indent level
        $this->indent = 0;
        // Indent all nodes
        $this->indentNode($rootNode);
    }
    
    /*
     * Strips text nodes that only contain whitespace
     */
    private function stripWhitespace($node)
    {
        // Make sure node is not a leafe node
        if($node->hasChildNodes())
        {
            // Iterate through children
            for($i=0; $i<$node->childNodes->length;$i++)
            {
                $childNode = $node->childNodes->item($i);
                // If whitespace node found, remove it
                if($childNode->nodeType == XML_TEXT_NODE)
                {
                    if(trim($childNode->nodeValue) == '')
                    {
                        $node->removeChild($childNode);
                        $i--;
                    }
                }
                // Recurse
                else
                {
                    $this->stripWhitespace($childNode);
                }
            }
        }
    }
    
    /**
     * Provide indentation to a DOMNode
     */
    private function indentNode($node)
    {
        // Make sure it is not a leafe node
        if($node->hasChildNodes())
        {
            // Count number of text nodes as children
            $textNodes = 0;
            foreach($node->childNodes as $childNode)
            {
                if($childNode->nodeType == XML_TEXT_NODE)
                {
                    $textNodes++;
                }
            }
            // If there are any child text nodes, don't recurse because everything inside is considered inline
            if($textNodes==0)
            {
                // Increase level of indentation
                $this->indent++;
                // Add newline and indent
                $before = "\n";
                $before .= str_repeat("\t",$this->indent);
                foreach($node->childNodes as $childNode)
                {
                    // Insert indentation before the node
                    $node->insertBefore(new DOMText($before), $childNode);                  
                    // Recursive
                    $this->indentNode($childNode);
                }               
                // Decrease level of indentation
                $this->indent--;
                // Add newline and closing indent
                $after = "\n";
                $after .= str_repeat("\t",$this->indent);
                $node->appendChild(new DOMText($after));
            }
        }
    }
}
 
$xml = new DOMDocument();
$xml->load('example.html');
$indentor = new DOMIndentor();
$indentor->indent($xml);
echo $xml->saveXML();
?>
Example.php:

Code: Select all

<html>
<head><title>This is a title</title></head>
<body>
<ul><li>A list item</li><li><em>Another</em> List item</li><li><ul><li><strong>Nested</strong> List Item</li><li>Again?</li></ul></li></ul>
<table>
<tr><th>Name</th><td>Bob the <strong>Man</strong></td></tr>
<tr><th>Foo</th><td>Bar</td></tr>
</table>
</body>
</html>
Output:

Code: Select all

<?xml version="1.0"?>
<html>
    <head>
        <title>This is a title</title>
    </head>
    <body>
        <ul>
            <li>A list item</li>
            <li><em>Another</em> List item</li>
            <li>
                <ul>
                    <li><strong>Nested</strong> List Item</li>
                    <li>Again?</li>
                </ul>
            </li>
        </ul>
        <table>
            <tr>
                <th>Name</th>
                <td>Bob the <strong>Man</strong></td>
            </tr>
            <tr>
                <th>Foo</th>
                <td>Bar</td>
            </tr>
        </table>
    </body>
</html>

The only problem with this is that you can only consider the file/source as XML, and not HTML, because if you call DOMDocument::loadHTML() or DOMDocument::saveHTML() it adds it's own weird whitespace and disturbs this script.

I'm no PHP or XML Wizard, so critique is welcome...
Last edited by Verminox on Thu May 29, 2008 7:25 am, edited 1 time in total.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: DOMIndentor - Make your XML/HTML Neat!

Post by Eran »

Whats the point of wasting CPU cycles to indent HTML through PHP? Just do it by hand... it will also make the source more readable
User avatar
Verminox
Forum Contributor
Posts: 101
Joined: Sun May 07, 2006 5:19 am

Re: DOMIndentor - Make your XML/HTML Neat!

Post by Verminox »

pytrin wrote:Whats the point of wasting CPU cycles to indent HTML through PHP? Just do it by hand... it will also make the source more readable
What's the point of doing anything in PHP if it could be done by hand in the same amount of time, maybe thousands of times, without any hassle? ;)

Well, you might say that PHP Is dynamic and even if you were superhuman who could perform tasks at the speed of a computer, you probably can't sit at a webserver everyday and respond to different requests having variable paramaters. You might just be right. :P

And that's just when these little apparently pointless pieces of code come into use. To turn some random request into a desirable output.

See Firefox for example. Create an XML file without any indentation (or too MUCH indentation) and open it in Firefox, it will be displayed neatly anyway... This is the equivalent of it in PHP. :) I'm using it to indentify markup coming from user posts and a WYSIWYG editor.
nowaydown1
Forum Contributor
Posts: 169
Joined: Sun Apr 27, 2008 1:22 am

Re: DOMIndentor - Make your XML/HTML Neat!

Post by nowaydown1 »

I think that's pretty slick personally. Nice job. :o
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: DOMIndentor - Make your XML/HTML Neat!

Post by Eran »

Verminox wrote: I'm using it to indentify markup coming from user posts and a WYSIWYG editor.
Well this is point actually, somehow missed this reply. I thought you might be using this class to avoid indenting your HTML views by hand...
User avatar
JAB Creations
DevNet Resident
Posts: 2341
Joined: Thu Jan 13, 2005 6:44 pm
Location: Sarasota Florida
Contact:

Re: DOMIndentor - Make your XML/HTML Neat!

Post by JAB Creations »

It's cool though I use single space indents instead of an entire tab. It drives a lot of other developers crazy but when you have a dozen or so tab spaces it drives me crazy. If I were to use a script like this (though I indent everything myself by hand in my setup) I'd appreciate a variable at the beginning where I could easily define how much of an indent or type of an indent I could have the script generate. Regardless of this missing and I suppose subjective feature great job! :mrgreen:
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: DOMIndentor - Make your XML/HTML Neat!

Post by John Cartwright »

Thanks for sharing!
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: DOMIndentor - Make your XML/HTML Neat!

Post by alex.barylski »

Sweet. I coudl use this when users elect to edit HTML by hand. I hate WYSIWYG mangled code. Ugh...
Post Reply