lookbehinds problem

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
michalmas
Forum Newbie
Posts: 14
Joined: Wed Apr 29, 2009 9:04 am

lookbehinds problem

Post by michalmas »

Hello,

I have strange behaviour for my test with negative lookbehind.

I have the text:

Code: Select all

some text is here ble bla blu
     now there is some 
     elem xsxsx
     struc something
     now we are inside and next element:
     elem value
     something els, and again
     elem val2
     end struc
and reg expr:

Code: Select all

((?<=struc\s).)*elem\s\S+
I want it to match all elem elements that are preceded by the struc. So, i want to get:

Code: Select all

elem value
elem val2
elem valXXX
Now i get:

Code: Select all

elem xsxsx
elem value
elem val2
elem valXXX
Thanks!
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: lookbehinds problem

Post by prometheuzz »

That can't be true since there is no "elem valXXX" substring in your example input string.
michalmas
Forum Newbie
Posts: 14
Joined: Wed Apr 29, 2009 9:04 am

Re: lookbehinds problem

Post by michalmas »

Oh, that's right. I haven't copied everything.

So, the text is:

Code: Select all

some text is here ble bla blu
now there is some 
elem xsxsx
struc something
now we are inside and next element:
elem value
something els, and again
elem val2
end struc
this is not inside
elem valXXX
struc 
asasas
end struc
sdffdsd
 
I believe that the main error in the reg exp is the *, so it matches any number of struc before, also zero times. But if i try to replace it by +, then i don't get any results.

I also tried to nest the expression ans say that after struc there needs to be any character. This didn't work either.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: lookbehinds problem

Post by Weirdan »

So you need to match 'elem something' given there was a 'struc' earlier in the text? I guess you need to drop second parenthesis around the lookbehind, like this (?<=struc\s.*)elem\s\S+
Also make sure you're using multiline regexp and dot matches newlines as well (by specifying /mi flag combo).
michalmas
Forum Newbie
Posts: 14
Joined: Wed Apr 29, 2009 9:04 am

Re: lookbehinds problem

Post by michalmas »

@Weirdan:

Your solution doesn't return any result (dot matching new lines in on).

The main purpose of this is to get all elem that are INSIDE the struc-endstruc structure. So, the requirement is that there was struc before but simultanously there was no end-struc after it.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: lookbehinds problem

Post by prometheuzz »

Do it in two steps:

1 - get everything in between struc and end struc
2 - for every match in step 1, find all elem's

And if your struc/end struc's are nested, then regex is not the right tool for the job. You need a true recursive descent parser. In which case, Google is your friend.

If they're not nested, this might work (untested!):

Code: Select all

'/elem\s\S+(?=(?:(?!struc).)*end\sstruc)/s'
Btw, this looks like the same thing as in your other thread. Perhaps it's better to keep the discussion in one thread?

Good luck.
Last edited by prometheuzz on Thu May 21, 2009 5:03 am, edited 1 time in total.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: lookbehinds problem

Post by prometheuzz »

Weirdan wrote:So you need to match 'elem something' given there was a 'struc' earlier in the text? I guess you need to drop second parenthesis around the lookbehind, like this (?<=struc\s.*)elem\s\S+ ...

"Variable length look behinds" are not supported by PHP's preg-methods.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: lookbehinds problem

Post by Weirdan »

prometheuzz wrote:
Weirdan wrote:like this (?<=struc\s.*)elem\s\S+ ...
"Variable length look behinds" are not supported by PHP's preg-methods.
You are right, I forgot that. :oops:
michalmas
Forum Newbie
Posts: 14
Joined: Wed Apr 29, 2009 9:04 am

Re: lookbehinds problem

Post by michalmas »

Do it in two steps:

1 - get everything in between struc and end struc
2 - for every match in step 1, find all elem's
The requirement is that it had to be in one expression. I realize that the problem could be easily solved if the special program was created.
And if your struc/end struc's are nested, then regex is not the right tool for the job. You need a true recursive descent parser. In which case, Google is your friend.
I agree - nesting can't be expressed in reg exps. And you are right - the alternative solution was the parser.
Btw, this looks like the same thing as in your other thread. Perhaps it's better to keep the discussion in one thread?
The problem is exactly the same, but i wanted to approach it from two different views (the most intuitive). Later i will join them though.
"Variable length look behinds" are not supported by PHP's preg-methods.
I am using PowerGrep for this :oops:
elem\s\S+(?=(?:(?!struc).)*end\sstruc)
And now i am lost. It is exactly what you suggested me some time ago, but then i couldn't make it working. But now - it is. And the hack you used - it works even for nested elements.
But to be sure - you check if elem is followed by end-struc which is not preceded by struc (the hack)?

AND:
does anyone knows why neither of the

Code: Select all

((?<=struc\s).)*elem\s\S+
or

Code: Select all

(?<=struc\s.*)elem\s\S+
works? Neither in Perl or PowerGrep...

Thanks!
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: lookbehinds problem

Post by prometheuzz »

michalmas wrote:
"Variable length look behinds" are not supported by PHP's preg-methods.
I am using PowerGrep for this :oops:
I've never used PowerGrep, but I am pretty sure it also does not support "look behinds" without a fixed length. Very few regex engines do (not even Perl's regex engine does!).
michalmas wrote:
elem\s\S+(?=(?:(?!struc).)*end\sstruc)
And now i am lost. It is exactly what you suggested me some time ago, but then i couldn't make it working. But now - it is. And the hack you used - it works even for nested elements.
But to be sure - you check if elem is followed by end-struc which is not preceded by struc (the hack)?
A short explanation is in order:

Code: Select all

elem              // match "elem"
\s                // match any white space char
\S+               // match one or more characters other than white space chars
(?=               // start positive look ahead
  (?:             //   start non capturing group 1
    (?!struc).    //     when looking ahead there's no string "struc", then match any character
  )               //   end non capturing group 1
  *               //   non capturing group 1 zero or more times
  end\sstruc      //   match "end", a white space char followed by "struc"
)                 // end positive look ahead
michalmas wrote:AND:
does anyone knows why neither of the

Code: Select all

((?<=struc\s).)*elem\s\S+
or

Code: Select all

(?<=struc\s.*)elem\s\S+
works? Neither in Perl or PowerGrep...

Thanks!
Just for clarity, could you post this question with the target text? Thanks.
Post Reply