DOMIndentor - Make your XML/HTML Neat!
Posted: Thu May 29, 2008 2:40 am
Just wrote this for an application I'm making so I thought I'd share / get opinions...
DOMIndentor is a class that takes a DOMDocument and gives it a nice indentation. It recognizes inline tags and does not break them up to new lines (eg. a <a> tag within your paragraph will not be newlined/indented). See the example below...
DOMIndentor.php
Example.php:
Output:
The only problem with this is that you can only consider the file/source as XML, and not HTML, because if you call DOMDocument::loadHTML() or DOMDocument::saveHTML() it adds it's own weird whitespace and disturbs this script.
I'm no PHP or XML Wizard, so critique is welcome...
DOMIndentor is a class that takes a DOMDocument and gives it a nice indentation. It recognizes inline tags and does not break them up to new lines (eg. a <a> tag within your paragraph will not be newlined/indented). See the example below...
DOMIndentor.php
Code: Select all
<?php
/**
* Formats an XML DOMDocument with neat indentation
*/
class DOMIndentor
{
/**
* Level of indentation
*/
private $indent;
/*
* The DOMDocument to indent
*/
private $document;
/*
* Formats the DOMDocument with neat indentation
*/
public function indent($document)
{
// Prepare the document
$this->document = $document;
// Get the root node
$rootNode = $this->document->documentElement;
// First strip all whitespace text nodes
$this->stripWhitespace($rootNode);
// Initialize indent level
$this->indent = 0;
// Indent all nodes
$this->indentNode($rootNode);
}
/*
* Strips text nodes that only contain whitespace
*/
private function stripWhitespace($node)
{
// Make sure node is not a leafe node
if($node->hasChildNodes())
{
// Iterate through children
for($i=0; $i<$node->childNodes->length;$i++)
{
$childNode = $node->childNodes->item($i);
// If whitespace node found, remove it
if($childNode->nodeType == XML_TEXT_NODE)
{
if(trim($childNode->nodeValue) == '')
{
$node->removeChild($childNode);
$i--;
}
}
// Recurse
else
{
$this->stripWhitespace($childNode);
}
}
}
}
/**
* Provide indentation to a DOMNode
*/
private function indentNode($node)
{
// Make sure it is not a leafe node
if($node->hasChildNodes())
{
// Count number of text nodes as children
$textNodes = 0;
foreach($node->childNodes as $childNode)
{
if($childNode->nodeType == XML_TEXT_NODE)
{
$textNodes++;
}
}
// If there are any child text nodes, don't recurse because everything inside is considered inline
if($textNodes==0)
{
// Increase level of indentation
$this->indent++;
// Add newline and indent
$before = "\n";
$before .= str_repeat("\t",$this->indent);
foreach($node->childNodes as $childNode)
{
// Insert indentation before the node
$node->insertBefore(new DOMText($before), $childNode);
// Recursive
$this->indentNode($childNode);
}
// Decrease level of indentation
$this->indent--;
// Add newline and closing indent
$after = "\n";
$after .= str_repeat("\t",$this->indent);
$node->appendChild(new DOMText($after));
}
}
}
}
$xml = new DOMDocument();
$xml->load('example.html');
$indentor = new DOMIndentor();
$indentor->indent($xml);
echo $xml->saveXML();
?>Code: Select all
<html>
<head><title>This is a title</title></head>
<body>
<ul><li>A list item</li><li><em>Another</em> List item</li><li><ul><li><strong>Nested</strong> List Item</li><li>Again?</li></ul></li></ul>
<table>
<tr><th>Name</th><td>Bob the <strong>Man</strong></td></tr>
<tr><th>Foo</th><td>Bar</td></tr>
</table>
</body>
</html>Code: Select all
<?xml version="1.0"?>
<html>
<head>
<title>This is a title</title>
</head>
<body>
<ul>
<li>A list item</li>
<li><em>Another</em> List item</li>
<li>
<ul>
<li><strong>Nested</strong> List Item</li>
<li>Again?</li>
</ul>
</li>
</ul>
<table>
<tr>
<th>Name</th>
<td>Bob the <strong>Man</strong></td>
</tr>
<tr>
<th>Foo</th>
<td>Bar</td>
</tr>
</table>
</body>
</html>The only problem with this is that you can only consider the file/source as XML, and not HTML, because if you call DOMDocument::loadHTML() or DOMDocument::saveHTML() it adds it's own weird whitespace and disturbs this script.
I'm no PHP or XML Wizard, so critique is welcome...