Regular Expression Help

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Antitrust
Forum Newbie
Posts: 11
Joined: Tue Jan 28, 2003 10:15 am
Location: Canada
Contact:

Regular Expression Help

Post by Antitrust »

I am trying to remove tags from a string of HTML in the following format:
if{ranks}This is the ranking system!endif{ranks}
Now, in that scenario I would want to remove everything between if{ranks} and endif{ranks}, including the tags themselves. If I have multiple tags, however, it gets tricker. Here's an example:
if{games}games1endif{games}Testing this!if{games}endif{games}
If I wanted to remove everything between the if{games} and endif{games} tags, but not what's in between. What is a regular expression which will not jump to the end and take everything from the first if{games} and the second endif{games} out - leaving nothing in the middle?

I have the following right now:

Code: Select all

$template = ereg_replace('if{' . $statement . '}+(.*)+endif{' . $statement . '}','',$template);
But that removes everything between the two farthest tags. Thanks for your help!
User avatar
Stoker
Forum Regular
Posts: 782
Joined: Thu Jan 23, 2003 9:45 pm
Location: SWNY
Contact:

Post by Stoker »

You should never use ereg (The PHP team should remove it from the default PHP config), it is very insufficient and resource using. Use preg instead if you really need regular expression power..

The quick answer to your question is "Greedy by default".. a regex will grasp as much as it can, it does not stop at first hit, to avoid this add a questionmark (?) after the multiplier..

I am not all that familiar with posix syntax, what is that }+ for? I wonder if you have mistaken + to be a concatenator? + is one or more, * is - or more, in the one below it looks for 1 or more of anything between } and endif{ being non-greedy (the ?)..

$template = preg_replace_all ('/if\{' . $statement . '\}.+?endif\{' . $statement . '\}','',$template);
User avatar
twigletmac
Her Royal Site Adminness
Posts: 5371
Joined: Tue Apr 23, 2002 2:21 am
Location: Essex, UK

Post by twigletmac »

Moved to PHP - Normal.

Mac
Antitrust
Forum Newbie
Posts: 11
Joined: Tue Jan 28, 2003 10:15 am
Location: Canada
Contact:

Post by Antitrust »

Solved the problem which was caused mainly by these things:

- I used ereg_replace instead pf preg_replace
- There were newlines in the text
- The .* was greedy

Here's the good code:

Code: Select all

$template = preg_replace('^if\!\{' . $statement . '\}(.*?)endif\!\{' . $statement . '\}^s','',$template);
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

eregi uses posix. posix is greedy. it has no non-greedy ability. perl does.

for perl, to make non greedy over a stretch, i found out this trick should be used: (?U)

for more information, there is the thread i learned that in: viewtopic.php?t=10272
i don't need the thread anymore, so you may hijack it with my blessing (and point to this if anyone complains)
User avatar
Stoker
Forum Regular
Posts: 782
Joined: Thu Jan 23, 2003 9:45 pm
Location: SWNY
Contact:

Post by Stoker »

Just a couple of optimizing questions Antitrust; the (.*?) will accept nothin as well (use + instead of * if you want it to be at least something). If you want to accept newlines you may want to change it to (.|\\r|\\n)+? .. Are you using the found wildecard data for anything later? Otherwise there is no point grouping with ( ), but if using the other one with or's | you need grouping for that, if you need that OR, and catch the found data-group add another set of ( ) around.

The ^s at the end, is that a literal? ^ means beginning of string, $ means end. a ^ inside a character group [] means negated.
Post Reply