Page 1 of 1

Regex WTF???

Posted: Fri Feb 01, 2008 11:58 pm
by alex.barylski
I have the following simple regex:

Code: Select all

$content = preg_replace('/<\/?((b|strong)\W[^>]*)>/is', '*', $content);
It actually works after some struggle...

Problem is...I don't understand how it's working...I get that it's matching two tags...but why does the non-alphanumeric have to follow? Shouldn't it be whitespace - if anything?

Secondly, I have tried to adapt this regex so that both opening and closing tags are replaced with asterisks using the following:

Code: Select all

$content = preg_replace('/<(\/?((b|strong)\W[^>]*))>/is', '*', $content);
It works - sorta!!! But it chops off everything until end of line on the closing tag...

Help :)

Cheers :)

Re: Regex WTF???

Posted: Sat Feb 02, 2008 2:53 am
by Chris Corbyn
It matches (in ABNF notation):

"<" [ "/" ] ( "b" / "strong" ) NON-ALPHA *< ANY CHAR except ">" (%d62) > ">"

That non-alpha seems silly (and wrong) yes. Maybe:

Code: Select all

$content = preg_replace('/<\/?(b|strong)\b[^>]*>/is', '*', $content);

Re: Regex WTF???

Posted: Sat Feb 09, 2008 4:24 pm
by GeertDD
Optimization tips:

Code: Select all

 
// Before:
$content = preg_replace('/<\/?(b|strong)\b[^>]*>/is', '*', $content);
 
// After:
$content = preg_replace('#</?(?:b|strong)\b[^>]*+>#i', '*', $content);
 
  • You don't need the s modifier. No dot to be found in the regex. Throw it away.
  • You don't need to store b|strong, so use non-capturing parentheses.
  • Kill useless backtracking on [^>]*> by making it possessive.
  • Change delimeters to some char you don't have in your regex to improve readability a bit.