combining two regexes

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

combining two regexes

Post by John Cartwright »

What I'm trying to do here is capture all the text within the brackets, as well as the text before the bracket.. an example would be

Code: Select all

foo[bar][bar2]

Code: Select all

preg_match('#^([a-zA-Z0-9_])\[#', $name, $outter);

Code: Select all

preg_match_all('#\[([a-zA-Z0-9_])\]#', $name, $inner);
But how on earth can I grab the initial text while using preg_match_all()? I'm simply stumped on this one :(
mrkite
Forum Contributor
Posts: 104
Joined: Tue Sep 11, 2007 4:19 am

Post by mrkite »

Code: Select all

$code="pre[one][two]"
preg_match('{^(\w*)\[(\w*)\]\[(\w*)\]}',$code,$matches);

//$matches[1] = pre
//$matches[2] = one
//$matches[3] = two
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

feyd:~ feyd$ cat foo2.php
<?php

preg_match_all('#[a-zA-Z0-9_]+(?:\[[a-zA-Z0-9_]+\])*#','foo1[bar1][bar2] foo2 foo3[bar3]', $match);
print_r($match);
feyd:~ feyd$ php -f foo2.php
Array
(
    [0] => Array
        (
            [0] => foo1[bar1][bar2]
            [1] => foo2
            [2] => foo3[bar3]
        )

)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Thanks for the replies :)

Maybe I'm misunderstanding or I didn't quite explain correctly, apologies.
feyd wrote:

Code: Select all

#[a-zA-Z0-9_]+(?:\[[a-zA-Z0-9_]+\])*#
Looking at this I can see this will group the (?: ) will group the bracket segment of the subject together (had to read d11's regex crash course ;)), however I'm trying to capture the value of all the text inside each bracket aswell.

I took I shot at adapting feyd's regex with only limited success.

Code: Select all

#([a-zA-Z0-9_]+)(?:\[([a-zA-Z0-9_]+)\])*#
Using the subject "foobar2[foo][bar]" my results have been:

Code: Select all

Array
(
    [0] => foobar2[foo][bar]
    [1] => foobar2
    [2] => bar
)
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

It can only remember the last bracketed reference unless you add more of the subpattern. To accurately capture all of them without knowing how many there are requires two patterns. One to capture the entire variable reference, the second to capture the contents.

If it's only working with the fully captured variable, using preg_split() could work better.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

What I needed to know. Thanks feyd. :)
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

Code: Select all

$re = '~\w+(?=\[)|(?<=\[)\w+(?=\])~';

$subj = "foo1[bar1][bar2] foo2 foo3[bar3]";

preg_match_all($re, $subj, $m);
print_r($m[0]);
outputs

Code: Select all

Array
(
    [0] => foo1
    [1] => bar1
    [2] => bar2
    [3] => foo3
    [4] => bar3
)
Is this what you're looking for?
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

stereofrog wrote:

Code: Select all

$re = '~\w+(?=\[)|(?<=\[)\w+(?=\])~';
Is this what you're looking for?
If it is, I think you're needlessly complicating that regex.

Try this pattern:

Code: Select all

/[^\[\]]+/
And if you don't want the spaces:

Code: Select all

/[^\[\]\s]+/
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Thanks for the follow ups stereofrog and GeertDD.
GeertDD wrote:
stereofrog wrote:

Code: Select all

$re = '~\w+(?=\[)|(?<=\[)[\w]+(?=\])~';
Is this what you're looking for?
If it is, I think you're needlessly complicating that regex.

Try this pattern:

Code: Select all

/[^\[\]]+/
And if you don't want the spaces:

Code: Select all

/[^\[\]\s]+/

I've tried the patterns above, however I am not completely getting my desired results. If I have an element with no name, simply foo[bar][] it is being ignored by the regex and only returning foo, bar.. any idea how to modify '/[^\[\]]+/' for blank values?

Again thanks, my regex skills are merely mediocre
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

The patterns provided by ~GeertDD are intended for preg_split().
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

I ended up slightly modifying stereofrog's regex,

Code: Select all

'~\w+(?=\[)|(?<=\[).*?(?=\])~'
to allow for empty keys. However, I am still interested in pursuing the preg_split() option. So far I havn't been succesful with it, since passing a string to

foobar[f1][f2][] would be rendered to the following by preg_split()

Code: Select all

Array
(
    [0] => 
    [1] => [
    [2] => ][
    [3] => ][]
)
I'm feeling a little bit helpless here, unfortunately :(
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

#\]\[|\[|\]#
would be more for preg_split.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

feyd wrote:

Code: Select all

#\]\[|\[|\]#
would be more for preg_split.
Okay so this regex will split the input on either ][, [, ].. nice :)

Still one tiny issue though,

Code: Select all

$this->_inputIndices = preg_split('#\]\[|\[|\]#', 'foobar2f[f1][f][]');
		
echo '<pre>';
print_r($this->_inputIndices);
Returns:

Code: Select all

Array
(
    [0] => foobar2f
    [1] => f1
    [2] => f
    [3] => 
    [4] => 
)
When there should only be 4 array elements :( I guess this is because it is splitting the last bracket at the end of the string.. Any ideas?

I'm thrilled that I (you guys ;)) have gotten this process down to a single line of code though :)

Thanks again.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

feyd wrote:The patterns provided by ~GeertDD are intended for preg_split().
Nope, they aren't. Have a closer look.

Code: Select all

/[^\[\]\s]+/
Selects every substring that is separated by square brackets or whitespace. It works fine except for when empty square brackets pop up. In that case I agree preg_split() will be a better solution.

Jcart wrote:there should only be 4 array elements :( I guess this is because it is splitting the last bracket at the end of the string.. Any ideas?
You'll always end up with one final empty element because the last char of your string is ']'. I tried to cook something up with a lookahead construction to prevent this. No success however. I suggest to just use a function like array_pop() to always chop the last element off the array.

Also feyd's split pattern can be optimized a bit.
Before:

Code: Select all

#\]\[|\[|\]#
After:

Code: Select all

#\]\[?|\[#
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

GeertDD wrote:No success however. I suggest to just use a function like array_pop() to always chop the last element off the array.
Yea I figured so, and was exactly what I wanted to avoid.. since not all strings supplied will have brackets at all therefore will not have the empty element..
I think I'll stick to the preg_match_all then...

Thanks for all the input guys.
Post Reply