Page 1 of 2

combining two regexes

Posted: Tue Sep 18, 2007 2:59 pm
by John Cartwright
What I'm trying to do here is capture all the text within the brackets, as well as the text before the bracket.. an example would be

Code: Select all

foo[bar][bar2]

Code: Select all

preg_match('#^([a-zA-Z0-9_])\[#', $name, $outter);

Code: Select all

preg_match_all('#\[([a-zA-Z0-9_])\]#', $name, $inner);
But how on earth can I grab the initial text while using preg_match_all()? I'm simply stumped on this one :(

Posted: Tue Sep 18, 2007 3:35 pm
by mrkite

Code: Select all

$code="pre[one][two]"
preg_match('{^(\w*)\[(\w*)\]\[(\w*)\]}',$code,$matches);

//$matches[1] = pre
//$matches[2] = one
//$matches[3] = two

Posted: Tue Sep 18, 2007 3:39 pm
by feyd

Code: Select all

feyd:~ feyd$ cat foo2.php
<?php

preg_match_all('#[a-zA-Z0-9_]+(?:\[[a-zA-Z0-9_]+\])*#','foo1[bar1][bar2] foo2 foo3[bar3]', $match);
print_r($match);
feyd:~ feyd$ php -f foo2.php
Array
(
    [0] => Array
        (
            [0] => foo1[bar1][bar2]
            [1] => foo2
            [2] => foo3[bar3]
        )

)

Posted: Tue Sep 18, 2007 5:21 pm
by John Cartwright
Thanks for the replies :)

Maybe I'm misunderstanding or I didn't quite explain correctly, apologies.
feyd wrote:

Code: Select all

#[a-zA-Z0-9_]+(?:\[[a-zA-Z0-9_]+\])*#
Looking at this I can see this will group the (?: ) will group the bracket segment of the subject together (had to read d11's regex crash course ;)), however I'm trying to capture the value of all the text inside each bracket aswell.

I took I shot at adapting feyd's regex with only limited success.

Code: Select all

#([a-zA-Z0-9_]+)(?:\[([a-zA-Z0-9_]+)\])*#
Using the subject "foobar2[foo][bar]" my results have been:

Code: Select all

Array
(
    [0] => foobar2[foo][bar]
    [1] => foobar2
    [2] => bar
)

Posted: Tue Sep 18, 2007 5:29 pm
by feyd
It can only remember the last bracketed reference unless you add more of the subpattern. To accurately capture all of them without knowing how many there are requires two patterns. One to capture the entire variable reference, the second to capture the contents.

If it's only working with the fully captured variable, using preg_split() could work better.

Posted: Tue Sep 18, 2007 5:31 pm
by John Cartwright
What I needed to know. Thanks feyd. :)

Posted: Wed Sep 19, 2007 4:00 am
by stereofrog

Code: Select all

$re = '~\w+(?=\[)|(?<=\[)\w+(?=\])~';

$subj = "foo1[bar1][bar2] foo2 foo3[bar3]";

preg_match_all($re, $subj, $m);
print_r($m[0]);
outputs

Code: Select all

Array
(
    [0] => foo1
    [1] => bar1
    [2] => bar2
    [3] => foo3
    [4] => bar3
)
Is this what you're looking for?

Posted: Wed Sep 19, 2007 4:52 am
by GeertDD
stereofrog wrote:

Code: Select all

$re = '~\w+(?=\[)|(?<=\[)\w+(?=\])~';
Is this what you're looking for?
If it is, I think you're needlessly complicating that regex.

Try this pattern:

Code: Select all

/[^\[\]]+/
And if you don't want the spaces:

Code: Select all

/[^\[\]\s]+/

Posted: Wed Sep 19, 2007 10:52 am
by John Cartwright
Thanks for the follow ups stereofrog and GeertDD.
GeertDD wrote:
stereofrog wrote:

Code: Select all

$re = '~\w+(?=\[)|(?<=\[)[\w]+(?=\])~';
Is this what you're looking for?
If it is, I think you're needlessly complicating that regex.

Try this pattern:

Code: Select all

/[^\[\]]+/
And if you don't want the spaces:

Code: Select all

/[^\[\]\s]+/

I've tried the patterns above, however I am not completely getting my desired results. If I have an element with no name, simply foo[bar][] it is being ignored by the regex and only returning foo, bar.. any idea how to modify '/[^\[\]]+/' for blank values?

Again thanks, my regex skills are merely mediocre

Posted: Wed Sep 19, 2007 10:55 am
by feyd
The patterns provided by ~GeertDD are intended for preg_split().

Posted: Wed Sep 19, 2007 12:22 pm
by John Cartwright
I ended up slightly modifying stereofrog's regex,

Code: Select all

'~\w+(?=\[)|(?<=\[).*?(?=\])~'
to allow for empty keys. However, I am still interested in pursuing the preg_split() option. So far I havn't been succesful with it, since passing a string to

foobar[f1][f2][] would be rendered to the following by preg_split()

Code: Select all

Array
(
    [0] => 
    [1] => [
    [2] => ][
    [3] => ][]
)
I'm feeling a little bit helpless here, unfortunately :(

Posted: Wed Sep 19, 2007 12:39 pm
by feyd

Code: Select all

#\]\[|\[|\]#
would be more for preg_split.

Posted: Wed Sep 19, 2007 12:53 pm
by John Cartwright
feyd wrote:

Code: Select all

#\]\[|\[|\]#
would be more for preg_split.
Okay so this regex will split the input on either ][, [, ].. nice :)

Still one tiny issue though,

Code: Select all

$this->_inputIndices = preg_split('#\]\[|\[|\]#', 'foobar2f[f1][f][]');
		
echo '<pre>';
print_r($this->_inputIndices);
Returns:

Code: Select all

Array
(
    [0] => foobar2f
    [1] => f1
    [2] => f
    [3] => 
    [4] => 
)
When there should only be 4 array elements :( I guess this is because it is splitting the last bracket at the end of the string.. Any ideas?

I'm thrilled that I (you guys ;)) have gotten this process down to a single line of code though :)

Thanks again.

Posted: Wed Sep 19, 2007 1:36 pm
by GeertDD
feyd wrote:The patterns provided by ~GeertDD are intended for preg_split().
Nope, they aren't. Have a closer look.

Code: Select all

/[^\[\]\s]+/
Selects every substring that is separated by square brackets or whitespace. It works fine except for when empty square brackets pop up. In that case I agree preg_split() will be a better solution.

Jcart wrote:there should only be 4 array elements :( I guess this is because it is splitting the last bracket at the end of the string.. Any ideas?
You'll always end up with one final empty element because the last char of your string is ']'. I tried to cook something up with a lookahead construction to prevent this. No success however. I suggest to just use a function like array_pop() to always chop the last element off the array.

Also feyd's split pattern can be optimized a bit.
Before:

Code: Select all

#\]\[|\[|\]#
After:

Code: Select all

#\]\[?|\[#

Posted: Wed Sep 19, 2007 1:57 pm
by John Cartwright
GeertDD wrote:No success however. I suggest to just use a function like array_pop() to always chop the last element off the array.
Yea I figured so, and was exactly what I wanted to avoid.. since not all strings supplied will have brackets at all therefore will not have the empty element..
I think I'll stick to the preg_match_all then...

Thanks for all the input guys.