PHP-based table of contents generator

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

Post Reply
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

PHP-based table of contents generator

Post by Ambush Commander »

This bit of code generates a table of contents for a DOMDocument you pass it by checking out the headings, and then inserting a list in a marker element called <div id="toc" /> It's not really procedural, but I removed the abstraction for readability's sake.

Any comments?

Code: Select all

<?php
 
function generate_toc(DOMDocument $dom) {
        
        // setup xpath, this can be factored out
        $xpath = new DOMXPath($dom);
        $xpath->registerNamespace('html', "http://www.w3.org/1999/xhtml");
        
        // test for ToC container, if not present don't bother
        $container = $xpath->query("//html:div[@id='toc']")->item(0);
        if (!$container) return;
        
        // grab all headings h2 and down from the document
        $headings = array('h2', 'h3', 'h4', 'h5', 'h6');
        foreach ($headings as $k => $v) $headings[$k] = "self::html:$v";
        $query_headings = implode(' or ', $headings);
        $query = "//*[$query_headings]"; // looks like "//*[self::html:h2 or ...]"
        $headings = $xpath->query($query);
        
        // setup the table of contents element
        $toc = $dom->createElement('ul');
        $container->appendChild($dom->createElement('h2', 'Table of Contents'));
        $container->appendChild($toc);
        
        // iterate through headings and build the table of contents
        $current_level = 2;
        $parents = array(false, $toc);
        $indexes = array(0);
        $i = 0;
        foreach ($headings as $node) {
            $level = (int) $node->tagName[1];
            $name  = $node->textContent; // no support for formatting
            
            while ($level > $current_level) {
                if (!$parents[$current_level-1]->lastChild) {
                    $parents[$current_level-1]->appendChild(
                        $dom->createElement('li')
                    );
                }
                $sublist = $dom->createElement('ul');
                $parents[$current_level - 1]->lastChild->appendChild($sublist);
                $parents[$current_level] = $sublist;
                $current_level++;
                $indexes[$current_level - 2] = 0;
            }
            
            while ($level < $current_level) {
                unset($indexes[$current_level - 2]);
                $current_level--;
            }
            
            $indexes[$current_level - 2]++;
            
            
            $line = $dom->createElement('li');
            $label = $dom->createElement('span', implode('.', $indexes) . '.');
            $label->setAttribute('class', 'toc-label');
            $line->appendChild($label);
            $link = $dom->createElement('a', $name);
            $line->appendChild($link);
            $parents[$current_level-1]->appendChild($line);
            
            // setup the anchors
            $header_id = $node->getAttribute('id');
            if (!$header_id) {
                $header_id = 'toclink' . $i;
                $node->setAttribute('id', $header_id);
            }
            $link->setAttribute('href', '#' . $header_id);
            
        }
}
 
 
?>
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

At first glance the code looks really condensed. Can you provide a sample of what it does and how it would be beneficial to use?
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

Maybe in a blog post where you want to automatically add a contents listing to a post? :). Looks simple and useful.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Usage:

Code: Select all

$dom = new DOMDocument();
$dom->loadHTML($html_page);
generate_toc($dom);
echo $dom->saveHTML();
Here's an example of the table of contents generator in action. Otherwise, you'd have to generate it by hand.
At first glance the code looks really condensed.
Hmm... looks like some more comments are in order then.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

That's the nature of the beast though, sometimes you can't get around it. This should probably be moved into snippets, I'm sure it will come in handy for a lot of programmers.
promethean
Forum Newbie
Posts: 1
Joined: Mon Feb 04, 2008 9:49 am

Re: PHP-based table of contents generator

Post by promethean »

This code looks really useful.... does it depend on any classes that need included?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: PHP-based table of contents generator

Post by Christopher »

I think it lead to HTMLPurfier, you should check that out -- though it does not do HTML generation (I don't think). There are a number of threads in these forums that have covered generating HTML with PHP.
(#10850)
iaincollins
Forum Newbie
Posts: 1
Joined: Mon Aug 10, 2009 7:25 am

Re: PHP-based table of contents generator

Post by iaincollins »

Thanks for this, it was VERY handy. I was pleasantly surprised to find it (mostly ;-)) works!

I am generating XHTML content using TinyMCE, and this was exactly what I was looking for. I'd written something to do this in PHP before, but using regexes, however something DOM based like this is far better. It works really well on the existing content I have (database driven) so I am a very happy bunny.

Anyway, I make a couple of tweaks:

Firstly, I set $i = 1 at the start. Starting the counter at 1 instead of 0 makes for better URL's when linking to anchors, as the numbers in the URL's then correspond closer to what is on the page so might help avoid confusion (even though the headings / heading order could still change).

Secondly I slightly modified the code which sets up the anchors and incremented $i in the same block ... which seems to be a minor bug which slipped through in your post ;) :

Code: Select all

 
// setup the anchors
 $header_id = $node->getAttribute('id');
if (!$header_id) {
    $header_id = 'heading' . $i;
    $node->setAttribute('id', $header_id);
    $node->nodeValue = implode('.', $indexes).") ".$node->nodeValue;
    $i++;
 }
 
This assigns numbering to the headings, which makes it much easier to navigate content as the headings are then numbered in a way that corresponds to the numbers in the Table of Contents.

e.g.

Code: Select all

Table Of Contents
 
1. Heading
 1.1 Sub-Heading
 
1) Heading
text
 
1.1) SubHeading
 
text
 
Lastly, I used the following CSS to style the TOC (might be handy for anyone not so familiar with CSS and list taming and too lazy/busy to figure it out):

Code: Select all

 
#toc li {
    list-style-type: none;  
    padding-bottom: 2pt;        
}
 
#toc ul {
    padding-top: 2pt;
    margin-top: 0px;    
    padding-left: 10pt;
}
 
#toc .toc-label {
    font-weight: bold;
}
 
Post Reply