Page 1 of 1
Capture overwrite?!
Posted: Wed Sep 03, 2008 4:49 am
by hymy
I'm kind of baffled by these results... or am I not thinking of something?
Code: Select all
if ( preg_match( '/(?:(a|b)(.)){2}/', 'aAbB', $aMatch ) )
{
var_dump( $aMatch );
exit( 0 );
}
exit( 1 );
I expected it to return two results - as it really does match - ...
array(5) {
[0]=>
string(4) "aAbB"
[1]=>
string(1) "a"
[2]=>
string(1) "A"
[3]=>
string(1) "b"
[4]=>
string(1) "B"
}
... yet this is what it yields...
array(3) {
[0]=>
string(4) "aAbB"
[1]=>
string(1) "b"
[2]=>
string(1) "B"
}
Can anyone shed some light on this for me?
Re: Capture overwrite?!
Posted: Wed Sep 03, 2008 5:05 am
by prometheuzz
Nope, when using "{n}" the matches are overwritten. And the n-th match is remembered.
Re: Capture overwrite?!
Posted: Wed Sep 03, 2008 8:34 am
by tempAccount
Anyone knows a way to get all the matches and not just the last one (even when using * and not {2}) by using only one regex?
Re: Capture overwrite?!
Posted: Wed Sep 03, 2008 12:28 pm
by GeertDD
Drop the {2} part: /(a|b)(.)/, and use preg_match_all().
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 2:23 am
by tempAccount
thanks GeertDD,
seems our question was incomplete tho,
can someone also find a way to get a,A and b,B out of this one?
Code: Select all
preg_match('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
var_dump($aMatch);
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 2:37 am
by prometheuzz
tempAccount wrote:thanks GeertDD,
seems our question was incomplete tho,
can someone also find a way to get a,A and b,B out of this one?
Code: Select all
preg_match('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
var_dump($aMatch);
Did you try what Geert suggested (preg_match_all(...))?
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 6:50 am
by tempAccount
yes, it works as expected, it finds multiple matches, but it does not work in my latter example, not with preg_match and not with preg_match_all
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 8:47 am
by prometheuzz
tempAccount wrote:yes, it works as expected, it finds multiple matches, but it does not work in my latter example, not with preg_match and not with preg_match_all
Err, as Geert suggested:
Code: Select all
preg_match_all('/(a|b)(.)/', 'XaAbBY', $result);
will result in a, b and A, B to be matched.
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 9:07 am
by tempAccount
Thanks for helping on this.
Note that in the second regex we're also capturing the Y:
That's why the non-capturing parentheses are needed.
Code: Select all
preg_match_all('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
The difficulty in this regex is not getting the aA and bB, but getting those -and- the Y.
I initially left this out since i thought it was irrelevant to what i was trying to do, but GeertDD made me see i needed to add this 'complexity'.
Hope i'm not forgetting anything this time

Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 10:05 am
by prometheuzz
tempAccount wrote:Thanks for helping on this.
Note that in the second regex we're also capturing the Y:
That's why the non-capturing parentheses are needed.
Code: Select all
preg_match_all('/X(?:(a|b)(.))*(.)/', 'XaAbBY', $aMatch);
The difficulty in this regex is not getting the aA and bB, but getting those -and- the Y.
I initially left this out since i thought it was irrelevant to what i was trying to do, but GeertDD made me see i needed to add this 'complexity'.
Hope i'm not forgetting anything this time

Sorry, you lost me.
Perhaps you could give a few example input strings
and make them a bit more elaborate and explain which part(s) of it you need to extract: IMHO, one 6 character long example isn't sufficient in this case unfortunately.
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 10:29 am
by tempAccount
What we're trying to do is extract all property/value pairs out of something like this, and the img tag, by using only one regex statement.
<a property1="value1" property2="value2"><img src="imgURL"/></a>
I boiled it down to the 6 letter string to not get distracted with countless other methods of extracting this.
The essence of the question is if regex can do this kind of thing in one statement.
Still, here are some example strings, and the values i'd like to extract. Maybe they make more sense now:
Y => Y
aAY => (a, A) and Y
aBZ => (a, B) and Z
bCX => (b, C) and X
aAbAX => (a, A) (b, A) and X
aAaAaAY => (a, A) (a, A) (a, A) and Y
bAcAaAZ => (b, A) (c, A) (a, A) and Z
so in the real world example:
<a property1="value1" property2="value2"><img src="imgURL"/></a>
find:
(property1, value1) (property2, value2) and imgURL
Re: Capture overwrite?!
Posted: Thu Sep 04, 2008 10:43 am
by prometheuzz
tempAccount wrote:What we're trying to do is extract all property/value pairs out of something like this, and the img tag, by using only one regex statement.
<a property1="value1" property2="value2"><img src="imgURL"/></a>
I boiled it down to the 6 letter string to not get distracted with countless other methods of extracting this.
But by doing so, you leave so many details out, so that it's impossible to give a proper answer (all IMHO, of course!).
tempAccount wrote:The essence of the question is if regex can do this kind of thing in one statement.
Still, here are some example strings, and the values i'd like to extract. Maybe they make more sense now:
Y => Y
aAY => (a, A) and Y
aBZ => (a, B) and Z
bCX => (b, C) and X
aAbAX => (a, A) (b, A) and X
aAaAaAY => (a, A) (a, A) (a, A) and Y
bAcAaAZ => (b, A) (c, A) (a, A) and Z
No, such small examples tell nothing about the real problem (as I can see from your next example). Sorry.
tempAccount wrote:so in the real world example:
<a property1="value1" property2="value2"><img src="imgURL"/></a>
find:
(property1, value1) (property2, value2) and imgURL
Well, there's something to work with!
Perhaps you could explain this in a bit more detail? Looking at this one example, one might think that you want to match all key-value pairs except when the key is "src", in which case you only want to match the value (and not the key), which this regex will take care of:
Code: Select all
'/\b((?:(?!src|\s).)*)="([^"]++)"/i' // not properly tested, but should do what I described above
It could also mean to match all key-value pair except for the last pair: in that case only match the value, which could be accomplished by this one:
Code: Select all
'/([^\s=]++(?=.*?[^\s=]++="[^"]++"))?="([^"]++)"/i'
But it might mean something entirely different. That's why I asked for a couple of example's: then I might deduce these rules from the examples myself. Of course, it would be better if you could explain it yourself in detail: it's the first step in solving your problem: being able to properly explain it to someone who has no prior knowledge of your problem.
HTH.