Page 1 of 1

With regex, replacing certain text within matched occurrence

Posted: Sun Jul 05, 2009 6:48 pm
by DJNed
Note recent edit at bottom of post

Ok, this is going to be tricky to explain so please bear with me. Basically, I have a string; let's call it $text. Now, within $text, there is a variable number of substrings in the form of "[xxx]...[yyy]", where '...' is an arbitrarily-lengthed string of characters. I am capable of writing a regex expression that will match any occurrence of "[xxx]...[yyy]" in $text. However, I want to formulate an expression such that for each occurrence of "[xxx]...[yyy]", it will also replace certain characters in the '...' section.

Though this wasn't the exact code (I have replaced a few variable names), here was one of my attempts:

Code: Select all

preg_replace("/\[xxx\](.*)\[yyy\]/ie", "'<img src=\"/cgi-bin/script.cgi?'.rawurlencode(str_replace('abc', '123', '$1')", $text);
So in that above example, I'm trying to replace any occurrences of 'abc' with '123' when they occur between [xxx] and [yyy]. I'm using str_replace() because the string I'm trying to replace contains a newline character, which as far as I'm aware can't be matched in a regex expression.

There's probably a ridiculously simple solution that I'll kick myself for not seeing, but at the moment I'm drawing a blank and driving myself insane trying to figure it out. I hope something will understand what I'm asking and know what to do.


Edit: I believe that the reason I'm unable to get this to work is because the text between [xxx] and [yyy] contains at least one newline character, which isn't matched by the (.*) expression. Consequently, no regex match is found at all.

My temporary solution has been to do a str_replace() on the whole string, $text, before the regex to replace all occurrences of '\n' to something the regex can interpret. However, this has the consequence of replacing all occurrences of '\n' in the whole text, not just those between [xxx] and [yyy], so I'm back to square one: trying to replace only the '\n' strings that are contained between [xxx] and [yyy].

Edit 2: I'm an idiot and forgot the PCRE 's' modifier that allows the regex to match newline characters. However, my expression still doesn't parse properly, giving a vague "fatal error" saying that the replacement code, ie. from "<img src=" onwards, fails to evaluate. The expression is now as follows:

Code: Select all

$text = preg_replace("/\[xxx\](.*)\[yyy\]/ies", "'<img src=\"/cgi-bin/script.cgi?'.rawurlencode(str_replace('\n', '\\', '$1'))", $text);
Obviously I'm trying to replace '\n' characters between [xxx] and [yyy] with '\' characters.

Re: With regex, replacing certain text within matched occurrence

Posted: Sun Jul 05, 2009 8:35 pm
by requinix
The '\\' you have in your replacement string (which is showing up as '\' because the forum software is buggy) will get converted into a '\' after the first round of string parsing is done.
However there's a second round coming up, and '\' gets interpreted as the start of a single-quoted string plus an escaped apostrophe. That string doesn't end until the '$1', which means the $1 gets treated as PHP code.
But it's invalid (variables can't start with a number) so you get an error.

You have to escape the backslash twice: once for the first string (the preg_replace replacement text) and once for the second string (the str_replace replacement text).

Code: Select all

'\\\\'

Re: With regex, replacing certain text within matched occurrence

Posted: Sun Jul 05, 2009 8:57 pm
by DJNed
tasairis wrote:The '\\' you have in your replacement string (which is showing up as '\' because the forum software is buggy) will get converted into a '\' after the first round of string parsing is done.
However there's a second round coming up, and '\' gets interpreted as the start of a single-quoted string plus an escaped apostrophe. That string doesn't end until the '$1', which means the $1 gets treated as PHP code.
But it's invalid (variables can't start with a number) so you get an error.

You have to escape the backslash twice: once for the first string (the preg_replace replacement text) and once for the second string (the str_replace replacement text).

Code: Select all

'\\\\'
I get what you're saying, but with the following code, I'm still getting the same fatal parse error:

Code: Select all

$text = preg_replace("/\[xxx\](.*)\[yyy\]/ies", "'<img src=\"/cgi-bin/script.cgi?'.rawurlencode(str_replace('\n', '\\\\', '$1'))", $text);
Also, I'm not sure how relevant it is, but the actual text I'm wanting to replace is '<br />\n', not just '\n'. I've tried escaping the '\n' character, and various combinations of single or double quotes, but nothing wants to parse properly.

Edit: Ahh, something else I forgot to mention, the desired replacement string should be '\\', not just '\', so I'm assuming I'll want 8 slashes to compensate for the two rounds of string parsing, no? I've tried with 8 but still no luck. I've noticed also in the error details that it seems the special html characters I'm trying to parse in the replacement variable of the preg_replace (ie. '<img src=...') are all being converted to their ampersand equivalents, <, >, etc. Consequently, at least according to the error details, it seems the '<br />' that I want to replace is being converted to special characters and so won't match to what's in $text anymore, could this be happening?

I've just looked back at what I've said and I'm confused myself, sorry! It doesn't help that it's gone 3am here...

Re: With regex, replacing certain text within matched occurrence

Posted: Sun Jul 05, 2009 9:08 pm
by requinix
I don't quite see why, yet, but I guess you have to escape it three times.

Code: Select all

'\\\\\\\\'

Re: With regex, replacing certain text within matched occurrence

Posted: Sun Jul 05, 2009 9:12 pm
by DJNed
(Have edited above post)

Re: With regex, replacing certain text within matched occurrence

Posted: Sun Jul 05, 2009 9:49 pm
by requinix
Okay...

What text are you trying this replacement on and what are you trying to get in return?

Re: With regex, replacing certain text within matched occurrence

Posted: Mon Jul 06, 2009 6:48 am
by DJNed
Ok, here's a specific example of what I'm trying to do. Let's say I have the following $text variable:

Code: Select all

$text = "abc def ghi<br />\nabc def[xxx]bla bla bla<br />\nbla bla bla[yyy]abc def ghi";
I wish to convert it into the following text:

Code: Select all

abc def ghi<br />\nabc def<img src='/cgi-bin/script.cgi?bla%20bla%20bla%5C%5Cbla%20bla%20bla'>abc def ghi
You'll notice that all text outside of the [xxx] and [yyy] brackets are left untouched, including any occurrences of "<br />\n". The [xxx] and [yyy] brackets, along with the contents between them, have been replaced with an <img>. The <img> has as its source "/cgi-bin/script.cgi?" + the contents that was between [xxx] and [yyy]. However, the contents that was between [xxx] and [yyy] has had all occurrences of "<br />\n" replaced with just "\\", and then has been rawurlencode-d.

In truth, it's a relatively simple thing I want to do, basically to sort of perform regex on a matched result from another regex. It's just hard to explain, thank goodness you have patience!

Re: With regex, replacing certain text within matched occurrence

Posted: Mon Jul 06, 2009 9:32 am
by prometheuzz
Something like this:

Code: Select all

$text = "abc def ghi<br />\nabc def[xxx]bla bla bla<br />\nbla bla bla[yyy]abc def ghi";
$text = preg_replace('#(<br\s*/>|\n)(?=((?!\[xxx]).)*\[yyy])#is', '%5C', $text);
$text = preg_replace('# (?=((?!\[xxx]).)*\[yyy])#is', '%20', $text);
$text = preg_replace('#\[xxx](.*?)\[yyy]#is', "<img src='/cgi-bin/script.cgi?$1'>", $text);
?

Re: With regex, replacing certain text within matched occurrence

Posted: Mon Jul 06, 2009 9:52 am
by DJNed
prometheuzz wrote:Something like this:

Code: Select all

$text = "abc def ghi<br />\nabc def[xxx]bla bla bla<br />\nbla bla bla[yyy]abc def ghi";
$text = preg_replace('#(<br\s*/>|\n)(?=((?!\[xxx]).)*\[yyy])#is', '%5C', $text);
$text = preg_replace('# (?=((?!\[xxx]).)*\[yyy])#is', '%20', $text);
$text = preg_replace('#\[xxx](.*?)\[yyy]#is', "<img src='/cgi-bin/script.cgi?$1'>", $text);
?
I'll test your code when I get my server up again sometime later, thanks. I do however have one issue with it, in that the only special character replacements it appears to make are for %5C ('\') and %20 (' '), whereas I'd like for all special characters between [xxx] and [yyy] to be url-encoded, hence my original attempts using rawurlencode().

Re: With regex, replacing certain text within matched occurrence

Posted: Mon Jul 06, 2009 11:07 am
by prometheuzz
Ah, I see. I must confess I only looked at your example in- and output and didn't read the entire thread with enough attention.
I guess this would do the trick if I'm not mistaken:

Code: Select all

function foo($match) {
  $temp = preg_replace('#<br\s*/>|\n#s', "\\", $match[1]);
  return "<img src='/cgi-bin/script.cgi?" . rawurlencode($temp) . "'>";
}
 
$text = "abc def ghi<br />\nabc def[xxx]bla bla bla<br />\nbla bla bla[yyy]abc def ghi";
echo preg_replace_callback('#\[xxx](.*?)\[yyy]#is', 'foo', $text);
which produces the following output:

Code: Select all

abc def ghi<br />
abc def<img src='/cgi-bin/script.cgi?bla%20bla%20bla%5C%5Cbla%20bla%20bla'>abc def ghi

Re: With regex, replacing certain text within matched occurrence

Posted: Mon Jul 06, 2009 11:25 am
by DJNed
Thank you to you both, tasairis and prometheuzz. Your code worked perfectly, prometheuzz, and I've managed to adapt it to allow for an optional "float=left/right" attribute in the [xxx] tag, which works without any hiccup. I wasn't even aware that preg_replace_callback existed, so thank you for introducing me to something which looks like it will prove damn useful in the future.

Re: With regex, replacing certain text within matched occurrence

Posted: Mon Jul 06, 2009 12:03 pm
by prometheuzz
DJNed wrote:Thank you to you both, tasairis and prometheuzz. Your code worked perfectly, prometheuzz, and I've managed to adapt it to allow for an optional "float=left/right" attribute in the [xxx] tag, which works without any hiccup. I wasn't even aware that preg_replace_callback existed, so thank you for introducing me to something which looks like it will prove damn useful in the future.
Good to hear that, an you're welcome.