Capture overwrite?!

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
hymy
Forum Newbie
Posts: 1
Joined: Wed Sep 03, 2008 4:41 am

Capture overwrite?!

Post by hymy »

I'm kind of baffled by these results... or am I not thinking of something?

Code: Select all

 
if ( preg_match( '/(?:(a|b)(.)){2}/', 'aAbB', $aMatch ) )
{
  var_dump( $aMatch );
  exit( 0 );
}
 
exit( 1 );
 
I expected it to return two results - as it really does match - ...
array(5) {
[0]=>
string(4) "aAbB"
[1]=>
string(1) "a"
[2]=>
string(1) "A"
[3]=>
string(1) "b"
[4]=>
string(1) "B"
}
... yet this is what it yields...
array(3) {
[0]=>
string(4) "aAbB"
[1]=>
string(1) "b"
[2]=>
string(1) "B"
}
Can anyone shed some light on this for me?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Capture overwrite?!

Post by prometheuzz »

Nope, when using "{n}" the matches are overwritten. And the n-th match is remembered.
tempAccount
Forum Newbie
Posts: 5
Joined: Wed Sep 03, 2008 8:30 am

Re: Capture overwrite?!

Post by tempAccount »

Anyone knows a way to get all the matches and not just the last one (even when using * and not {2}) by using only one regex?
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: Capture overwrite?!

Post by GeertDD »

Drop the {2} part: /(a|b)(.)/, and use preg_match_all().
tempAccount
Forum Newbie
Posts: 5
Joined: Wed Sep 03, 2008 8:30 am

Re: Capture overwrite?!

Post by tempAccount »

thanks GeertDD,
seems our question was incomplete tho,
can someone also find a way to get a,A and b,B out of this one?

Code: Select all

 
preg_match('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
var_dump($aMatch);
 
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Capture overwrite?!

Post by prometheuzz »

tempAccount wrote:thanks GeertDD,
seems our question was incomplete tho,
can someone also find a way to get a,A and b,B out of this one?

Code: Select all

 
preg_match('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
var_dump($aMatch);
 
Did you try what Geert suggested (preg_match_all(...))?
tempAccount
Forum Newbie
Posts: 5
Joined: Wed Sep 03, 2008 8:30 am

Re: Capture overwrite?!

Post by tempAccount »

yes, it works as expected, it finds multiple matches, but it does not work in my latter example, not with preg_match and not with preg_match_all
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Capture overwrite?!

Post by prometheuzz »

tempAccount wrote:yes, it works as expected, it finds multiple matches, but it does not work in my latter example, not with preg_match and not with preg_match_all
Err, as Geert suggested:

Code: Select all

preg_match_all('/(a|b)(.)/', 'XaAbBY', $result);
will result in a, b and A, B to be matched.
tempAccount
Forum Newbie
Posts: 5
Joined: Wed Sep 03, 2008 8:30 am

Re: Capture overwrite?!

Post by tempAccount »

Thanks for helping on this.
Note that in the second regex we're also capturing the Y:
That's why the non-capturing parentheses are needed.

Code: Select all

preg_match_all('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
The difficulty in this regex is not getting the aA and bB, but getting those -and- the Y.
I initially left this out since i thought it was irrelevant to what i was trying to do, but GeertDD made me see i needed to add this 'complexity'.

Hope i'm not forgetting anything this time :)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Capture overwrite?!

Post by prometheuzz »

tempAccount wrote:Thanks for helping on this.
Note that in the second regex we're also capturing the Y:
That's why the non-capturing parentheses are needed.

Code: Select all

preg_match_all('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
The difficulty in this regex is not getting the aA and bB, but getting those -and- the Y.
I initially left this out since i thought it was irrelevant to what i was trying to do, but GeertDD made me see i needed to add this 'complexity'.

Hope i'm not forgetting anything this time :)
Sorry, you lost me.
Perhaps you could give a few example input strings and make them a bit more elaborate and explain which part(s) of it you need to extract: IMHO, one 6 character long example isn't sufficient in this case unfortunately.
tempAccount
Forum Newbie
Posts: 5
Joined: Wed Sep 03, 2008 8:30 am

Re: Capture overwrite?!

Post by tempAccount »

What we're trying to do is extract all property/value pairs out of something like this, and the img tag, by using only one regex statement.
<a property1="value1" property2="value2"><img src="imgURL"/></a>

I boiled it down to the 6 letter string to not get distracted with countless other methods of extracting this.
The essence of the question is if regex can do this kind of thing in one statement.

Still, here are some example strings, and the values i'd like to extract. Maybe they make more sense now:
Y => Y
aAY => (a, A) and Y
aBZ => (a, B) and Z
bCX => (b, C) and X
aAbAX => (a, A) (b, A) and X
aAaAaAY => (a, A) (a, A) (a, A) and Y
bAcAaAZ => (b, A) (c, A) (a, A) and Z

so in the real world example:
<a property1="value1" property2="value2"><img src="imgURL"/></a>
find:
(property1, value1) (property2, value2) and imgURL
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Capture overwrite?!

Post by prometheuzz »

tempAccount wrote:What we're trying to do is extract all property/value pairs out of something like this, and the img tag, by using only one regex statement.
<a property1="value1" property2="value2"><img src="imgURL"/></a>

I boiled it down to the 6 letter string to not get distracted with countless other methods of extracting this.
But by doing so, you leave so many details out, so that it's impossible to give a proper answer (all IMHO, of course!).

tempAccount wrote:The essence of the question is if regex can do this kind of thing in one statement.

Still, here are some example strings, and the values i'd like to extract. Maybe they make more sense now:
Y => Y
aAY => (a, A) and Y
aBZ => (a, B) and Z
bCX => (b, C) and X
aAbAX => (a, A) (b, A) and X
aAaAaAY => (a, A) (a, A) (a, A) and Y
bAcAaAZ => (b, A) (c, A) (a, A) and Z
No, such small examples tell nothing about the real problem (as I can see from your next example). Sorry.

tempAccount wrote:so in the real world example:
<a property1="value1" property2="value2"><img src="imgURL"/></a>
find:
(property1, value1) (property2, value2) and imgURL
Well, there's something to work with!
Perhaps you could explain this in a bit more detail? Looking at this one example, one might think that you want to match all key-value pairs except when the key is "src", in which case you only want to match the value (and not the key), which this regex will take care of:

Code: Select all

'/\b((?:(?!src|\s).)*)="([^"]++)"/i' // not properly tested, but should do what I described above
It could also mean to match all key-value pair except for the last pair: in that case only match the value, which could be accomplished by this one:

Code: Select all

'/([^\s=]++(?=.*?[^\s=]++="[^"]++"))?="([^"]++)"/i'
But it might mean something entirely different. That's why I asked for a couple of example's: then I might deduce these rules from the examples myself. Of course, it would be better if you could explain it yourself in detail: it's the first step in solving your problem: being able to properly explain it to someone who has no prior knowledge of your problem.

HTH.
Post Reply