Manipulating h1, h2, h3 content dynamically

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Manipulating h1, h2, h3 content dynamically

Post by Walid »

Assuming that I've got the content of a web-page loaded into a buffer, I'd like to do the following:

1. Check the page for any h1 content (i.e. that which is encapsulated between <h1 and </h1>).
2. Grab the inner text (i.e. that which is encapsulated between > and <).
3. Manipulate the inner text and then replace the original text with the modified text.

Anyone know how to achieve this?
User avatar
papa
Forum Regular
Posts: 958
Joined: Wed Aug 27, 2008 3:36 am
Location: Sweden/Sthlm

Re: Manipulating h1, h2, h3 content dynamically

Post by papa »

Code: Select all

preg_replace()
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Re: Manipulating h1, h2, h3 content dynamically

Post by Walid »

Lol! I thought that might have something to do with... doing it is the problem.
User avatar
papa
Forum Regular
Posts: 958
Joined: Wed Aug 27, 2008 3:36 am
Location: Sweden/Sthlm

Re: Manipulating h1, h2, h3 content dynamically

Post by papa »

We got some really good regexp guys on the forum. I can give it a shoot in a few minutes, not saying I'm good though... ;)
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Re: Manipulating h1, h2, h3 content dynamically

Post by Walid »

That would be great. THanks.
User avatar
papa
Forum Regular
Posts: 958
Joined: Wed Aug 27, 2008 3:36 am
Location: Sweden/Sthlm

Re: Manipulating h1, h2, h3 content dynamically

Post by papa »

Code: Select all

 
<?php
$file = "h1.php";
 
$content = file_get_contents($file);
 
$changed_val = preg_replace('#<h1>[^<].+</h1>#i', "Papa", $content);
 
echo nl2br(htmlspecialchars($changed_val));
?>
Seems to work on a file h1.php:

Code: Select all

<html>
<head>
<title>test</title>
</head>
<body>
<h1>Test</h1>
Some text
</body>
</html>
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Re: Manipulating h1, h2, h3 content dynamically

Post by Walid »

Although I haven't yet tried it.... I need Papa to be the content returned by a function which processes the inner text that exists within <h1></h1>.

Also, how about those cases when the h1 tag has some attributes (e.g. class name, id, inline style etc)?

Thanks again.
User avatar
papa
Forum Regular
Posts: 958
Joined: Wed Aug 27, 2008 3:36 am
Location: Sweden/Sthlm

Re: Manipulating h1, h2, h3 content dynamically

Post by papa »

Code: Select all

 
<?php
 
$file = "h1.php";
 
$content = file_get_contents($file);
 
$changed_val = preg_match_all('#(<h1[^>]*>)([^<].+)(</h1>)#i', $content, $matches);
 
print_r($matches);
?>
 
This one stores your matches in an array.

What kind of processing?

You can use preg_replace to change the content for example.

Code: Select all

preg_replace('#(<h1[^>]*>)([^<].+)(</h1>)#i', '$1Papa$2', $content); //keeps the h1 elements but changes the text
User avatar
Chewbacca
Forum Newbie
Posts: 5
Joined: Fri Jan 23, 2009 1:57 pm

Re: Manipulating h1, h2, h3 content dynamically

Post by Chewbacca »

Another way to go about this could be using one of the various XML implementations (in this case SimpleXML) to traverse and edit the DOM. You might want to run some bench mark tests to see if this route offers an acceptable level of performance for your use case.

Code: Select all

<?php
 
$htmlBufferString = <<<HTML
<html>
    <head>
        <title>Blah</title>
    </head>
    <body>
        <div>
            <div>
                <h1 attribute="data">H1 string</h1>
            </div>
        </div>
        <h1 id="another">Another H1</h1>
        <h2>Header 2</h2>
        <h3>Header 3</h3>
        <h4>Header 4</h4>
    </body>
</html>
HTML;
 
// if Tidy is installed, clean the HTML before traversing the DOM
if (extension_loaded('tidy')) {
    $html = tidy_repair_string($htmlBufferString);
}
 
$html = new SimpleXMLElement($htmlBufferString);
foreach ($html->xpath('//h1|//h2|//h3') as $tag) {    
    switch ($tag->getName()) {
        case 'h1':
            $tag[0] = $tag . ' | edited h1';
            break;
        case 'h2':
            $tag[0] = $tag . ' | edited h2';
            break;
        case 'h3':
            $tag[0] = $tag . ' | edited h3';
            break;
        default:
            $tag[0] = $tag . ' | edited header tag';    
    }   
}
 
echo '<pre>' . htmlentities($html->asXML()) . '</pre>';
Thanks,
Chewy
Walid
Forum Commoner
Posts: 33
Joined: Mon Mar 17, 2008 8:43 am

Re: Manipulating h1, h2, h3 content dynamically

Post by Walid »

Thanks to both... papa's solution pushed me in the right direction and this is what I ended up with...

Code: Select all

 
function Replace($matches)
{
    $Case = $matches[0];
    $OldText = strip_tags($Case);
    $NewText = process($Case);
    return str_replace($OldText, $NewText, $FullValue);
}
$pattern = '/<h2\s.*?>(.*?)<\/h2>/i';
$this->Document = preg_replace_callback($pattern, "Replace", $this->Document);
 
Thanks.
Post Reply