Page 1 of 1

Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 4:40 am
by Walid
Assuming that I've got the content of a web-page loaded into a buffer, I'd like to do the following:

1. Check the page for any h1 content (i.e. that which is encapsulated between <h1 and </h1>).
2. Grab the inner text (i.e. that which is encapsulated between > and <).
3. Manipulate the inner text and then replace the original text with the modified text.

Anyone know how to achieve this?

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 4:52 am
by papa

Code: Select all

preg_replace()

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 4:55 am
by Walid
Lol! I thought that might have something to do with... doing it is the problem.

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 5:41 am
by papa
We got some really good regexp guys on the forum. I can give it a shoot in a few minutes, not saying I'm good though... ;)

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 5:44 am
by Walid
That would be great. THanks.

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 5:55 am
by papa

Code: Select all

 
<?php
$file = "h1.php";
 
$content = file_get_contents($file);
 
$changed_val = preg_replace('#<h1>[^<].+</h1>#i', "Papa", $content);
 
echo nl2br(htmlspecialchars($changed_val));
?>
Seems to work on a file h1.php:

Code: Select all

<html>
<head>
<title>test</title>
</head>
<body>
<h1>Test</h1>
Some text
</body>
</html>

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 5:59 am
by Walid
Although I haven't yet tried it.... I need Papa to be the content returned by a function which processes the inner text that exists within <h1></h1>.

Also, how about those cases when the h1 tag has some attributes (e.g. class name, id, inline style etc)?

Thanks again.

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 8:26 am
by papa

Code: Select all

 
<?php
 
$file = "h1.php";
 
$content = file_get_contents($file);
 
$changed_val = preg_match_all('#(<h1[^>]*>)([^<].+)(</h1>)#i', $content, $matches);
 
print_r($matches);
?>
 
This one stores your matches in an array.

What kind of processing?

You can use preg_replace to change the content for example.

Code: Select all

preg_replace('#(<h1[^>]*>)([^<].+)(</h1>)#i', '$1Papa$2', $content); //keeps the h1 elements but changes the text

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 10:43 am
by Chewbacca
Another way to go about this could be using one of the various XML implementations (in this case SimpleXML) to traverse and edit the DOM. You might want to run some bench mark tests to see if this route offers an acceptable level of performance for your use case.

Code: Select all

<?php
 
$htmlBufferString = <<<HTML
<html>
    <head>
        <title>Blah</title>
    </head>
    <body>
        <div>
            <div>
                <h1 attribute="data">H1 string</h1>
            </div>
        </div>
        <h1 id="another">Another H1</h1>
        <h2>Header 2</h2>
        <h3>Header 3</h3>
        <h4>Header 4</h4>
    </body>
</html>
HTML;
 
// if Tidy is installed, clean the HTML before traversing the DOM
if (extension_loaded('tidy')) {
    $html = tidy_repair_string($htmlBufferString);
}
 
$html = new SimpleXMLElement($htmlBufferString);
foreach ($html->xpath('//h1|//h2|//h3') as $tag) {    
    switch ($tag->getName()) {
        case 'h1':
            $tag[0] = $tag . ' | edited h1';
            break;
        case 'h2':
            $tag[0] = $tag . ' | edited h2';
            break;
        case 'h3':
            $tag[0] = $tag . ' | edited h3';
            break;
        default:
            $tag[0] = $tag . ' | edited header tag';    
    }   
}
 
echo '<pre>' . htmlentities($html->asXML()) . '</pre>';
Thanks,
Chewy

Re: Manipulating h1, h2, h3 content dynamically

Posted: Tue Mar 03, 2009 12:44 pm
by Walid
Thanks to both... papa's solution pushed me in the right direction and this is what I ended up with...

Code: Select all

 
function Replace($matches)
{
    $Case = $matches[0];
    $OldText = strip_tags($Case);
    $NewText = process($Case);
    return str_replace($OldText, $NewText, $FullValue);
}
$pattern = '/<h2\s.*?>(.*?)<\/h2>/i';
$this->Document = preg_replace_callback($pattern, "Replace", $this->Document);
 
Thanks.