Page 1 of 1

Regular Expression Help

Posted: Thu Jul 03, 2003 6:51 pm
by Antitrust
I am trying to remove tags from a string of HTML in the following format:
if{ranks}This is the ranking system!endif{ranks}
Now, in that scenario I would want to remove everything between if{ranks} and endif{ranks}, including the tags themselves. If I have multiple tags, however, it gets tricker. Here's an example:
if{games}games1endif{games}Testing this!if{games}endif{games}
If I wanted to remove everything between the if{games} and endif{games} tags, but not what's in between. What is a regular expression which will not jump to the end and take everything from the first if{games} and the second endif{games} out - leaving nothing in the middle?

I have the following right now:

Code: Select all

$template = ereg_replace('if{' . $statement . '}+(.*)+endif{' . $statement . '}','',$template);
But that removes everything between the two farthest tags. Thanks for your help!

Posted: Thu Jul 03, 2003 11:36 pm
by Stoker
You should never use ereg (The PHP team should remove it from the default PHP config), it is very insufficient and resource using. Use preg instead if you really need regular expression power..

The quick answer to your question is "Greedy by default".. a regex will grasp as much as it can, it does not stop at first hit, to avoid this add a questionmark (?) after the multiplier..

I am not all that familiar with posix syntax, what is that }+ for? I wonder if you have mistaken + to be a concatenator? + is one or more, * is - or more, in the one below it looks for 1 or more of anything between } and endif{ being non-greedy (the ?)..

$template = preg_replace_all ('/if\{' . $statement . '\}.+?endif\{' . $statement . '\}','',$template);

Posted: Fri Jul 04, 2003 3:51 am
by twigletmac
Moved to PHP - Normal.

Mac

Posted: Fri Jul 04, 2003 9:18 am
by Antitrust
Solved the problem which was caused mainly by these things:

- I used ereg_replace instead pf preg_replace
- There were newlines in the text
- The .* was greedy

Here's the good code:

Code: Select all

$template = preg_replace('^if\!\{' . $statement . '\}(.*?)endif\!\{' . $statement . '\}^s','',$template);

Posted: Fri Jul 04, 2003 1:49 pm
by m3rajk
eregi uses posix. posix is greedy. it has no non-greedy ability. perl does.

for perl, to make non greedy over a stretch, i found out this trick should be used: (?U)

for more information, there is the thread i learned that in: viewtopic.php?t=10272
i don't need the thread anymore, so you may hijack it with my blessing (and point to this if anyone complains)

Posted: Fri Jul 04, 2003 10:43 pm
by Stoker
Just a couple of optimizing questions Antitrust; the (.*?) will accept nothin as well (use + instead of * if you want it to be at least something). If you want to accept newlines you may want to change it to (.|\\r|\\n)+? .. Are you using the found wildecard data for anything later? Otherwise there is no point grouping with ( ), but if using the other one with or's | you need grouping for that, if you need that OR, and catch the found data-group add another set of ( ) around.

The ^s at the end, is that a literal? ^ means beginning of string, $ means end. a ^ inside a character group [] means negated.