Page 1 of 1

Matching nested tag (help on an already made regex)

Posted: Tue Feb 03, 2009 5:09 pm
by zenhop
Hello,
My project is a little complicated, so I'm gonna simplify with a simple example:

Let take this code:
<ftl:loop name="c1">
<ftl:loop name="c2">
<ftl:loop name="c3">
blah
</ftl:loop>
</ftl:loop>
</ftl:loop>
My goal is to match and replace those tags, one by one.
But, if I want to match and replace each of those tags in the classical way (matching and replacing <ftl:loop [attributes]>(.*?)</ftl:loop>), I will never be able to parse nested tags like this, since the regex would match:
<ftl:loop name="c1">
<ftl:loop name="c2">
<ftl:loop name="c3">
blah
</ftl:loop>

</ftl:loop>
</ftl:loop>

Code: Select all

Regex used: (Matching normal and singleton tags):
<ftl:loop\s+(.*?)\s*(>(.*?)</ftl:loop>|/>)
So the only way I see to match and parse those nested tags, is to start matching them from the last nested tag, which means, this one:
<ftl:loop name="c1">
<ftl:loop name="c2">
<ftl:loop name="c3">
blah
</ftl:loop>

</ftl:loop>
</ftl:loop>
Then, Once replaced, I can run the same regex to replace <ftl:loop name="c2"> then <ftl:loop name="c1"> and my tags will have been perfectly matched.

here is my theory:
Instead of matching <ftl:loop [attributes]>(.*?)</ftl:loop> I should match <ftl:loop [attributes]>(anything except "<ftl:loop")</ftl:loop>
This way, only the last nested tag will match (<ftl:loop name="c3">), and once replaced, I can run again to match the previous tag (<ftl:loop name="c2">), then the first one (<ftl:loop name="c1">).

I tried a few things using The Regex Coach (amazing software!) but nothing works...

How can I translate in regex anything except "<ftl:loop"?
I tried (.*?)(^(<ftl:loop)*) among many other codes, but without any result.

Thank you in advance!

Re: Matching nested tag (help on an already made regex)

Posted: Wed Feb 04, 2009 12:08 am
by zenhop
I just spent the whole afternoon and night on Regex Coach and on many regex tutorials and articles, and I found!

So here is my regex:
<ftl:([a-zA-Z0-9_-]+)\s*((?:(?!>).)*)\s*(>((?:(?!ftl:\1).)+)</ftl:\1>|/>)

This will match any tag looking like <ftl:tagname></ftl:tagname> or <ftl:tagname/>, with or without attributes.
If a tag is nested in another tag of the same name, only the last tag of the tree (the more nested) will be matched, allowing to replace it, to run the regex again in order to match the others tags.

Thx for all the messages on the forum! It has been really useful!

I hope this message and solution will be useful for someone one day ;)

Re: Matching nested tag (help on an already made regex)

Posted: Wed Feb 04, 2009 3:45 am
by prometheuzz
zenhop wrote:I just spent the whole afternoon and night on Regex Coach and on many regex tutorials and articles, and I found!

So here is my regex:
<ftl:([a-zA-Z0-9_-]+)\s*((?:(?!>).)*)\s*(>((?:(?!ftl:\1).)+)</ftl:\1>|/>)

This will match any tag looking like <ftl:tagname></ftl:tagname> or <ftl:tagname/>, with or without attributes.
Note that "<ftl:tagname></ftl:tagname>" will not match it, but "<ftl:tagname> </ftl:tagname>" will (there must be one or more characters between the openeing- and closing tag).
This will correct that "bug":

Code: Select all

<ftl:([a-zA-Z0-9_-]+)\s*((?:(?!>).)*)\s*(>((?:(?!ftl:\1).)*)</ftl:\1>|/>) // changed the last "+" into a "*"
A couple of more small "readability enhancements":

- ((?:(?!>).)*) can be replaced by ([^>]*)
- ([a-zA-Z0-9_-]+) can be replaced by ([\w-]+)

But besides those minor remarks, all I can say is: well done!
; )

Re: Matching nested tag (help on an already made regex)

Posted: Wed Feb 04, 2009 7:05 am
by zenhop
Thx for the update :)
I'm having a really really weird bug now, using PHP and preg_replace_callback()...

<ftl:box> will match, but <ftl:boox> won't.
In fact, if I use twice the same letter in the tagname, it's not matching, and I really don't see any logical explanation for this behavior...
All my tags are matched and parser perfectly, except those with a letter used at least twice. I'm getting crazy, I'm working on it for like 4h now!!

Do you see any reason for this behavior?

thx!

(I'll post the complete parser once this bug fixed, since it could be useful for people searching to parse nested tags)

Re: Matching nested tag (help on an already made regex)

Posted: Wed Feb 04, 2009 7:46 am
by prometheuzz
Try something like this:

Code: Select all

<?php
$text =<<< BLOCK
<ftl:loop name="c1">
<ftl:boox>
<ftl:loop name="c2">
<ftl:loop name="c3">
blah
<ftl:loop1 />
<ftl:loop2 />
</ftl:loop>
</ftl:loop>
</ftl:boox>
</ftl:loop>
BLOCK;
 
$regex = '#<ftl:([\w-]+)[^>]*(/>|>((?!</?ftl:).)*</ftl:\1>)#is';
 
while(preg_match($regex, $text, $match)) {
  echo "Found: " . preg_replace('#[\r\n]#', '...', $match[0]) . "\n";
  $text = preg_replace("#{$match[0]}#", '', $text);
}
?>

Re: Matching nested tag (help on an already made regex)

Posted: Wed Feb 04, 2009 8:06 am
by zenhop
Yeah, working, when I try the parser with a new simple file instead of testing in my website, it's working... and now I just commented a few lines to debug, tried, uncommented the same lines without changing a single piece of code and it's working again...
I'm lost...
Anyway, I'm finishing to test and I'll post the source.

Thx for your help :)

Re: Matching nested tag (help on an already made regex)

Posted: Wed Feb 04, 2009 8:08 am
by prometheuzz
zenhop wrote:...
Thx for your help :)
You're welcome.