Page 1 of 1

PHP-based table of contents generator

Posted: Fri Mar 16, 2007 6:03 pm
by Ambush Commander
This bit of code generates a table of contents for a DOMDocument you pass it by checking out the headings, and then inserting a list in a marker element called <div id="toc" /> It's not really procedural, but I removed the abstraction for readability's sake.

Any comments?

Code: Select all

<?php
 
function generate_toc(DOMDocument $dom) {
        
        // setup xpath, this can be factored out
        $xpath = new DOMXPath($dom);
        $xpath->registerNamespace('html', "http://www.w3.org/1999/xhtml");
        
        // test for ToC container, if not present don't bother
        $container = $xpath->query("//html:div[@id='toc']")->item(0);
        if (!$container) return;
        
        // grab all headings h2 and down from the document
        $headings = array('h2', 'h3', 'h4', 'h5', 'h6');
        foreach ($headings as $k => $v) $headings[$k] = "self::html:$v";
        $query_headings = implode(' or ', $headings);
        $query = "//*[$query_headings]"; // looks like "//*[self::html:h2 or ...]"
        $headings = $xpath->query($query);
        
        // setup the table of contents element
        $toc = $dom->createElement('ul');
        $container->appendChild($dom->createElement('h2', 'Table of Contents'));
        $container->appendChild($toc);
        
        // iterate through headings and build the table of contents
        $current_level = 2;
        $parents = array(false, $toc);
        $indexes = array(0);
        $i = 0;
        foreach ($headings as $node) {
            $level = (int) $node->tagName[1];
            $name  = $node->textContent; // no support for formatting
            
            while ($level > $current_level) {
                if (!$parents[$current_level-1]->lastChild) {
                    $parents[$current_level-1]->appendChild(
                        $dom->createElement('li')
                    );
                }
                $sublist = $dom->createElement('ul');
                $parents[$current_level - 1]->lastChild->appendChild($sublist);
                $parents[$current_level] = $sublist;
                $current_level++;
                $indexes[$current_level - 2] = 0;
            }
            
            while ($level < $current_level) {
                unset($indexes[$current_level - 2]);
                $current_level--;
            }
            
            $indexes[$current_level - 2]++;
            
            
            $line = $dom->createElement('li');
            $label = $dom->createElement('span', implode('.', $indexes) . '.');
            $label->setAttribute('class', 'toc-label');
            $line->appendChild($label);
            $link = $dom->createElement('a', $name);
            $line->appendChild($link);
            $parents[$current_level-1]->appendChild($line);
            
            // setup the anchors
            $header_id = $node->getAttribute('id');
            if (!$header_id) {
                $header_id = 'toclink' . $i;
                $node->setAttribute('id', $header_id);
            }
            $link->setAttribute('href', '#' . $header_id);
            
        }
}
 
 
?>

Posted: Thu Mar 29, 2007 8:50 am
by Benjamin
At first glance the code looks really condensed. Can you provide a sample of what it does and how it would be beneficial to use?

Posted: Thu Mar 29, 2007 11:36 am
by Maugrim_The_Reaper
Maybe in a blog post where you want to automatically add a contents listing to a post? :). Looks simple and useful.

Posted: Thu Mar 29, 2007 3:21 pm
by Ambush Commander
Usage:

Code: Select all

$dom = new DOMDocument();
$dom->loadHTML($html_page);
generate_toc($dom);
echo $dom->saveHTML();
Here's an example of the table of contents generator in action. Otherwise, you'd have to generate it by hand.
At first glance the code looks really condensed.
Hmm... looks like some more comments are in order then.

Posted: Thu Mar 29, 2007 3:29 pm
by Benjamin
That's the nature of the beast though, sometimes you can't get around it. This should probably be moved into snippets, I'm sure it will come in handy for a lot of programmers.

Re: PHP-based table of contents generator

Posted: Mon Feb 04, 2008 9:50 am
by promethean
This code looks really useful.... does it depend on any classes that need included?

Re: PHP-based table of contents generator

Posted: Mon Feb 04, 2008 12:51 pm
by Christopher
I think it lead to HTMLPurfier, you should check that out -- though it does not do HTML generation (I don't think). There are a number of threads in these forums that have covered generating HTML with PHP.

Re: PHP-based table of contents generator

Posted: Mon Aug 10, 2009 7:49 am
by iaincollins
Thanks for this, it was VERY handy. I was pleasantly surprised to find it (mostly ;-)) works!

I am generating XHTML content using TinyMCE, and this was exactly what I was looking for. I'd written something to do this in PHP before, but using regexes, however something DOM based like this is far better. It works really well on the existing content I have (database driven) so I am a very happy bunny.

Anyway, I make a couple of tweaks:

Firstly, I set $i = 1 at the start. Starting the counter at 1 instead of 0 makes for better URL's when linking to anchors, as the numbers in the URL's then correspond closer to what is on the page so might help avoid confusion (even though the headings / heading order could still change).

Secondly I slightly modified the code which sets up the anchors and incremented $i in the same block ... which seems to be a minor bug which slipped through in your post ;) :

Code: Select all

 
// setup the anchors
 $header_id = $node->getAttribute('id');
if (!$header_id) {
    $header_id = 'heading' . $i;
    $node->setAttribute('id', $header_id);
    $node->nodeValue = implode('.', $indexes).") ".$node->nodeValue;
    $i++;
 }
 
This assigns numbering to the headings, which makes it much easier to navigate content as the headings are then numbered in a way that corresponds to the numbers in the Table of Contents.

e.g.

Code: Select all

Table Of Contents
 
1. Heading
 1.1 Sub-Heading
 
1) Heading
text
 
1.1) SubHeading
 
text
 
Lastly, I used the following CSS to style the TOC (might be handy for anyone not so familiar with CSS and list taming and too lazy/busy to figure it out):

Code: Select all

 
#toc li {
    list-style-type: none;  
    padding-bottom: 2pt;        
}
 
#toc ul {
    padding-top: 2pt;
    margin-top: 0px;    
    padding-left: 10pt;
}
 
#toc .toc-label {
    font-weight: bold;
}