using something like preg_match for subject of str_replace

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
oboedrew
Forum Commoner
Posts: 78
Joined: Fri Feb 20, 2009 1:17 pm

using something like preg_match for subject of str_replace

Post by oboedrew »

Is it possible to use preg_match or some similar function to determine the subject of str_replace, in order to change some html tags? For example:

Code: Select all

$new_string=str_replace('<p>', '<p class="whatever">', preg_match('|<p class="whatever">(.+)<p class="whatever">|', $original_string));
Basically, I need a function similar to preg_match, but that returns the matched string instead of returning the number of matches. I need to take a text file that includes html tags, read that file into a string using file_get_contents, locate a section that begins and ends with paragraphs of class="whatever," and turn all the paragraphs between those two into class="whatever" also.

Any ideas?

Thanks,
Drew
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: using something like preg_match for subject of str_repla

Post by McInfo »

Depending on how your HTML is structured, the pattern in this example might work for you. Read the code comments, including the ones in the HTML.

Code: Select all

<?php
// Makes debugging easier (no need for htmlentities())
header('Content-Type: text/plain');
 
// The source document
$html = <<<HTML
<html>
    <head>
        <title>Sample</title>
    </head>
    <body>
        <div class="whatever"></div>
        <div class="nonewline"><p class="sameline"><div>div in p in div</div></p></div>
        <div class="whatever">
            <p class="whatever">Gets this (1)
                <div>Div 1</div>
                <p>embedded parapraphs break the pattern; stops at this end tag</p>
                <div>Div 2</div>
            </p>
            <p>Do not get</p>
            <p class="whatever">Gets this (2)</p>
        </div>
        <p class="whatever"></p><!-- Gets the previous paragraph (3) -->
        <p>Do not get this either</p>
        <p class="whatever">
            <div>Gets this (4)</div>
        </p>
    </body>
</html>
HTML;
 
/*
 *                      # = pattern delimiter
 * (<p class="whatever">) = subpattern to match opening tag
 *             ((\v*.*)*) = subpattern to match between tags
 *                     \v = vertical whitespace (newline) (Since PHP 5.2.4)
 *                      . = any character except newline
 *                      * = match zero or more
 *                 (</p>) = subpattern to match closing tag
 *                      U = ungreedy
 */
$pattern = '#(<p class="whatever">)((\v*.*)*)(</p>)#U';
 
// Finds all matches, stores them in $matches
preg_match_all($pattern, $html, $matches, PREG_SET_ORDER);
 
// Equivalent to <hr /> for plain text
$separator = "\n" . str_repeat('-', 80) . "\n";
 
// Displays the results for each match
foreach ($matches as $m)
{
    echo $separator . "\n"
       . 'WHOLE MATCH: ' . $m[0] . "\n"
       . 'OPENING TAG: ' . $m[1] . "\n"
       . 'PARAGRAPH  : ' . $m[2] . "\n"
       . 'LAST LETTER: ' . $m[3] . "\n"
       . 'CLOSING TAG: ' . $m[4] . "\n"
       ;
}
echo $separator . "\n";
?>
Edit: This post was recovered from search engine cache.
Last edited by McInfo on Mon Jun 14, 2010 3:46 pm, edited 1 time in total.
oboedrew
Forum Commoner
Posts: 78
Joined: Fri Feb 20, 2009 1:17 pm

Re: using something like preg_match for subject of str_replace

Post by oboedrew »

I've read through that a few times, and I think I understand some (though not all) of it, but I'm not following how it solves my problem. I need to do a preg_replace on a string, but I don't know exactly what that string will be, so I need to use a regex to define the subject of the preg_replace. Or, put another way, I need to define the starting and ending tags of a substring, and then extract that substring into a new variable on which I can do a preg_replace.

Thanks,
Drew
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: using something like preg_match for subject of str_repla

Post by McInfo »

oboedrew wrote:... locate a section that begins and ends with paragraphs of class="whatever," and turn all the paragraphs between those two into class="whatever" also. ...
I need to get a better idea of what you are trying to match and what you are trying to replace it with.

Are you trying to match

Code: Select all

<p class="whatever">
    <p></p>
    <p></p>
    <p></p>
</p>
and replace it with

Code: Select all

<p class="whatever">
    <p class="whatever"></p>
    <p class="whatever"></p>
    <p class="whatever"></p>
</p>
Or are you trying to match

Code: Select all

<p class="whatever"></p>
<p></p>
<p></p>
<p></p>
<p class="whatever"></p>
and replace it with

Code: Select all

<p class="whatever"></p>
<p class="whatever"></p>
<p class="whatever"></p>
<p class="whatever"></p>
<p class="whatever"></p>
Or have I missed the objective completely?

Edit: This post was recovered from search engine cache.
Last edited by McInfo on Mon Jun 14, 2010 3:49 pm, edited 1 time in total.
oboedrew
Forum Commoner
Posts: 78
Joined: Fri Feb 20, 2009 1:17 pm

Re: using something like preg_match for subject of str_replace

Post by oboedrew »

I will try to explain in more detail. The project is a blogging platform using a flatfile database. Entries are typed into a form, and the blogger is able to use select BBCode tags for formatting. The entry is saved to a text file and then redisplayed on other pages with the BBCode tags converted to html.


So, for instance, a text file for an entry might read:

[title]Entry Title[/title]
2009-04-25

[subtitle]Entry Subtitle[/subtitle]

This is a paragraph with [i]some italics[/i] and a [url=http://www.whatever.com]link[/url] to [url]http://www.whatever.com[/url].

[quote]This is a block quotation.[/quote]


But then the php script that pulls up the text file and creates the visible blog entry includes these lines:

$entry=file_get_contents('text file displayed above');
$entry=htmlentities($entry, ENT_QUOTES, 'UTF-8');
$entry=str_replace("\r", '', $entry);
$entry="<p>$entry</p>";
$entry=str_replace("\n", "</p>\n<p>", $entry);
$entry=str_replace('<p></p>', '<p>&nbsp;</p>', $entry);
$entry=preg_replace('|<p>\[title](.+)\[/title]</p>|', '<p class="title">$1</p>', $entry);
$entry=preg_replace('|<p>\[subtitle](.+)\[/subtitle]</p>|', '<p class="subtitle">$1</p>', $entry);
$entry=preg_replace('|<p>\[quote](.+)\[/quote]</p>|', '<p class="quote">$1</p>', $entry);
$entry=preg_replace('|\[i](.+)\[/i]|', '<span class="italics">$1</span>', $entry);
$entry=preg_replace('|\[url](.+)\[/url]|', '<a href="$1">$1</a>', $entry);
$entry=preg_replace('|\[url=(.+)](.+)\[/url]|', '<a href="$1">$2</a>', $entry);


So the resulting html looks like this:

<p class="title">Entry Title</p>
<p>2009-04-25</p>
<p>&nbsp;</p>
<p class="subtitle">Entry Subtitle</p>
<p>&nbsp;</p>
<p>This is a paragraph with <span class="italics">some italics</span> and a <a href="http://www.whatever.com">link</a> to <a href="http://www.whatever.com">http://www.whatever.com</a>.</p>
<p>&nbsp;</p>
<p class="quote">This is a block quotation.</p>


This is all good for block quotations that contain prose, but it does not allow for poetry and such, where there will be newline characters within the quote.


[quote]Lord, make me an instrument of Thy peace;
where there is hatred, let me sow love;
where there is injury, pardon;
where there is doubt, faith;
where there is despair, hope;
where there is darkness, light;
and where there is sadness, joy.[/quote]

Becomes

<p class="quote">Lord, make me an instrument of Thy peace;</p>
<p>where there is hatred, let me sow love;</p>
<p>where there is injury, pardon;</p>
<p>where there is doubt, faith;</p>
<p>where there is despair, hope;</p>
<p>where there is darkness, light;</p>
<p>and where there is sadness, joy.</p>


So, instead of this...

$entry=preg_replace('|<p>\[quote](.+)\[/quote]</p>|', '<p class="quote">$1</p>', $entry);

... I need something that will locate <p>[quote] and [/quote]</p> and then change all <p> tags between into <p class="quote">.


Hope that clarifies. Any ideas?

Thanks,
Drew
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: using something like preg_match for subject of str_repla

Post by McInfo »

Try this

Code: Select all

<?php
$t = file_get_contents('example.txt');
$t = str_replace("\r", '', $t);
$t = htmlentities($t);
$t = "<p>$t</p>";
$t = str_replace("\n\n", '</p><p>', $t);
$t = str_replace("\n", '<br />', $t);
$t = preg_replace('#\[title\](.*?)\[/title\]#i', '<p class="title">\1</p>', $t);
$t = preg_replace('#\[subtitle\](.*?)\[/subtitle\]#i', '<p class="subtitle">\1</p>', $t);
$t = preg_replace('#\[i\](.*?)\[/i\]#i', '<i>\1</i>', $t);
$t = preg_replace('#\[b\](.*?)\[/b\]#i', '<b>\1</b>', $t);
$t = preg_replace('#\[quote\](.*?)\[/quote\]#i', '<p class="quote">\1</p>', $t);
$t = preg_replace('#\[url\](.*?)\[/url\]#i', '<a href="\1">\1</a>', $t);
$t = preg_replace('#\[url=(.*?)\](.*?)\[/url\]#i', '<a href="\1">\2</a>', $t);
$t = str_replace('</p><br />', '</p><p>', $t);
$t = str_replace('<p><p', '<p', $t);
$t = str_replace('</p></p>', '</p>', $t);
$t = str_replace('</p><p', "</p>\n<p", $t);
echo $t;
?>
2009-04-27 Edit: Changed the </quote> on line 12 to </p>.

If you want to see what is going on, put this header at the top of the script

Code: Select all

header('Content-Type: text/plain');
and this echo statement after each "$t = ..." statement

Code: Select all

echo $t . "\n\n-----\n\n";
Edit: This post was recovered from search engine cache.
Last edited by McInfo on Mon Jun 14, 2010 3:50 pm, edited 1 time in total.
oboedrew
Forum Commoner
Posts: 78
Joined: Fri Feb 20, 2009 1:17 pm

Re: using something like preg_match for subject of str_replace

Post by oboedrew »

McInfo, I'm studying your solution right now to better understand it. I have two questions about it.

First, I understand the need to escape the opening square bracket, but why escape the closing square bracket too?

Second, what is the meaning of the ? in (.*?), and why allow for 0 or more characters (.*) instead of one or more (.+) when BBCode tags should never be left empty?


I've also come up with an alternative solution that uses paragraphs exclusively for formatting entires (as in my original attempt), instead of mixing paragraphs and line breaks:

Code: Select all

 
while(preg_match('#<p>\[quote](.+)\[/quote]</p>#s', $entry, $match)){
    $match[1]=str_replace('<p>', "\t<p>", $match[1]);
    $entry=str_replace($match[0], "<div class=\"quote\">\n\t<p>$match[1]</p>\n</div>", $entry);
}
 
Then I just add a rule to the style sheet to determine formatting of <p></p> within <div class="quote"></div>.

However, I'm puzzled that this similar version does not work:

Code: Select all

 
preg_match_all('#<p>\[quote](.+)\[/quote]</p>#s', $entry, $matches);
foreach($matches as $match){
    $match[1]=str_replace('<p>', "\t<p>", $match[1]);
    $entry=str_replace($match[0], "<div class=\"quote\">\n\t<p>$match[1]</p>\n</div>", $entry);
}
 
I think this is the first time I've used preg_match_all, so maybe I'm doing something wrong with it. Any ideas?

Thanks,
Drew
Last edited by Benjamin on Mon Apr 27, 2009 8:26 pm, edited 2 times in total.
Reason: Added code tags
User avatar
McInfo
DevNet Resident
Posts: 1532
Joined: Wed Apr 01, 2009 1:31 pm

Re: using something like preg_match for subject of str_repla

Post by McInfo »

oboedrew wrote:why escape the closing square bracket too?
It is not necessary. I just like to escape special characters even if they will not be treated as special characters in the current context (in case the pattern grows).
oboedrew wrote:what is the meaning of the ? in (.*?)
The question mark prevents the .* pattern from being greedy and consuming the [/bbtag].
oboedrew wrote:why allow for 0 or more characters (.*) instead of one or more (.+) when BBCode tags should never be left empty?
If you don't want to match empty BBCode tags, then use the plus sign. If there are empty BBCode tags, they will be shown literally.
oboedrew wrote:I've also come up with an alternative solution...
I'll get back to you on that.

Edit: This post was recovered from search engine cache.
Last edited by McInfo on Mon Jun 14, 2010 3:50 pm, edited 1 time in total.
oboedrew
Forum Commoner
Posts: 78
Joined: Fri Feb 20, 2009 1:17 pm

Re: using something like preg_match for subject of str_replace

Post by oboedrew »

Ah, that ? to prevent greediness is most helpful. That just helped me solve another problem in a script unrelated to this one.

Nevermind my last question. I just figured it out:

Code: Select all

 
preg_match_all('#<p>\[quote](.+?)\[/quote]</p>#s', $entry, $matches, PREG_SET_ORDER);
foreach($matches as $match){
    $match[1]=str_replace('<p>', "\t<p>", $match[1]);
    $entry=str_replace($match[0], "<div class=\"quote\">\n<p>$match[1]</p>\n</div>", $entry);
}
 
Thanks again!

Cheers,
Drew
Last edited by Benjamin on Mon Apr 27, 2009 8:26 pm, edited 2 times in total.
Reason: Added code tags.
Post Reply