Page 1 of 1
lookbehinds problem
Posted: Mon May 18, 2009 3:19 pm
by michalmas
Hello,
I have strange behaviour for my test with negative lookbehind.
I have the text:
Code: Select all
some text is here ble bla blu
now there is some
elem xsxsx
struc something
now we are inside and next element:
elem value
something els, and again
elem val2
end struc
and reg expr:
I want it to match all elem elements that are preceded by the struc. So, i want to get:
Now i get:
Code: Select all
elem xsxsx
elem value
elem val2
elem valXXX
Thanks!
Re: lookbehinds problem
Posted: Thu May 21, 2009 2:12 am
by prometheuzz
That can't be true since there is no "elem valXXX" substring in your example input string.
Re: lookbehinds problem
Posted: Thu May 21, 2009 3:17 am
by michalmas
Oh, that's right. I haven't copied everything.
So, the text is:
Code: Select all
some text is here ble bla blu
now there is some
elem xsxsx
struc something
now we are inside and next element:
elem value
something els, and again
elem val2
end struc
this is not inside
elem valXXX
struc
asasas
end struc
sdffdsd
I believe that the main error in the reg exp is the *, so it matches any number of struc before, also zero times. But if i try to replace it by +, then i don't get any results.
I also tried to nest the expression ans say that after struc there needs to be any character. This didn't work either.
Re: lookbehinds problem
Posted: Thu May 21, 2009 4:02 am
by Weirdan
So you need to match 'elem something' given there was a 'struc' earlier in the text? I guess you need to drop second parenthesis around the lookbehind, like this (?<=struc\s.*)elem\s\S+
Also make sure you're using multiline regexp and dot matches newlines as well (by specifying /mi flag combo).
Re: lookbehinds problem
Posted: Thu May 21, 2009 4:26 am
by michalmas
@Weirdan:
Your solution doesn't return any result (dot matching new lines in on).
The main purpose of this is to get all elem that are INSIDE the struc-endstruc structure. So, the requirement is that there was struc before but simultanously there was no end-struc after it.
Re: lookbehinds problem
Posted: Thu May 21, 2009 4:58 am
by prometheuzz
Do it in two steps:
1 - get everything in between
struc and
end struc
2 - for every match in step 1, find all
elem's
And if your struc/end struc's are nested, then regex is not the right tool for the job. You need a true recursive descent parser. In which case, Google is your friend.
If they're not nested, this might work (untested!):
Code: Select all
'/elem\s\S+(?=(?:(?!struc).)*end\sstruc)/s'
Btw, this looks like the same thing as in your other thread. Perhaps it's better to keep the discussion in one thread?
Good luck.
Re: lookbehinds problem
Posted: Thu May 21, 2009 5:01 am
by prometheuzz
Weirdan wrote:So you need to match 'elem something' given there was a 'struc' earlier in the text? I guess you need to drop second parenthesis around the lookbehind, like this (?<=struc\s.*)elem\s\S+ ...
"Variable length look behinds" are not supported by PHP's preg-methods.
Re: lookbehinds problem
Posted: Thu May 21, 2009 12:15 pm
by Weirdan
prometheuzz wrote:Weirdan wrote:like this (?<=struc\s.*)elem\s\S+ ...
"Variable length look behinds" are not supported by PHP's preg-methods.
You are right, I forgot that.

Re: lookbehinds problem
Posted: Thu May 21, 2009 5:16 pm
by michalmas
Do it in two steps:
1 - get everything in between struc and end struc
2 - for every match in step 1, find all elem's
The requirement is that it had to be in one expression. I realize that the problem could be easily solved if the special program was created.
And if your struc/end struc's are nested, then regex is not the right tool for the job. You need a true recursive descent parser. In which case, Google is your friend.
I agree - nesting can't be expressed in reg exps. And you are right - the alternative solution was the parser.
Btw, this looks like the same thing as in your other thread. Perhaps it's better to keep the discussion in one thread?
The problem is exactly the same, but i wanted to approach it from two different views (the most intuitive). Later i will join them though.
"Variable length look behinds" are not supported by PHP's preg-methods.
I am using PowerGrep for this
elem\s\S+(?=(?:(?!struc).)*end\sstruc)
And now i am lost. It is exactly what you suggested me some time ago, but then i couldn't make it working. But now - it is. And the hack you used - it works even for nested elements.
But to be sure - you check if elem is followed by end-struc which is not preceded by struc (the hack)?
AND:
does anyone knows why neither of the
or
works? Neither in Perl or PowerGrep...
Thanks!
Re: lookbehinds problem
Posted: Fri May 22, 2009 7:40 am
by prometheuzz
michalmas wrote:"Variable length look behinds" are not supported by PHP's preg-methods.
I am using PowerGrep for this
I've never used PowerGrep, but I am pretty sure it also does not support "look behinds" without a fixed length. Very few regex engines do (not even Perl's regex engine does!).
michalmas wrote:elem\s\S+(?=(?:(?!struc).)*end\sstruc)
And now i am lost. It is exactly what you suggested me some time ago, but then i couldn't make it working. But now - it is. And the hack you used - it works even for nested elements.
But to be sure - you check if elem is followed by end-struc which is not preceded by struc (the hack)?
A short explanation is in order:
Code: Select all
elem // match "elem"
\s // match any white space char
\S+ // match one or more characters other than white space chars
(?= // start positive look ahead
(?: // start non capturing group 1
(?!struc). // when looking ahead there's no string "struc", then match any character
) // end non capturing group 1
* // non capturing group 1 zero or more times
end\sstruc // match "end", a white space char followed by "struc"
) // end positive look ahead
michalmas wrote:AND:
does anyone knows why neither of the
or
works? Neither in Perl or PowerGrep...
Thanks!
Just for clarity, could you post this question with the target text? Thanks.