Page 1 of 1

preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 12:07 am
by s.dot
I'm using the following pattern which wants to catch certain strings between different delimiters..

Code: Select all

 
$pattern = '/(' . $char1Start . '(.+?)' . $char1End . '|' . $char2Start . '(.+?)' . $char2End . ')/ism
Which would give me something like this:

Code: Select all

/(\[#\](.+?)\[\/#\]|\[*\](.+?)[\/*\])/ism
Then I use preg_match_all($pattern, $text, $matches);

The problem is I'm using the | (or) character. and if the second condition is met, I get empty array values in $matches for the first () and (.+?) that comes before the |.

Do I have to live with this and just array_filter() $matches when I'm done? Cuz I have a lot of empty values in my $matches array since I'm doing like 30 different |'s.

Re: preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 12:17 am
by prometheuzz
Can you also post the string that produces these empty entries?

Re: preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 12:53 am
by s.dot
Sure. I'm making a bbcode parser

My full pattern ends up being:

Code: Select all

/(\[b\](.+?)\[\/b\]|\[u\](.+?)\[\/u\]|\[i\](.+?)\[\/i\]|\[s\](.+?)\[\/s\]|\[img\](.+?)\[\/img\]|\[center\](.+?)\[\/center\]|\[marquee\](.+?)\[\/marquee\]|\[blink\](.+?)\[\/blink\]|\[size=(.+?)\](.+?)\[\/size\]|\[color=(.+?)\](.+?)\[\/color\]|\[url(=.+?)?\](.+?)\[\/url\]|\[quote(=.+?)?\](.+?)\[\/quote\])/ism
The text I'm using to match upon:

Code: Select all

[ b ]hi![ /b ] what\'s up with [ u ]you[ /u ], [ blink ]dude[ /blink ]? [ size=3 ]ok write me back[ /size ] [ quote ]something[ /quote ] [ quote=scott ]something else[ /quote ]
 
And the results

Code: Select all

Array
(
    [0] => Array
        (
            [0] => [ b ]hi![ /b ]
            [1] => [ b ]hi![ /b ]
            [2] => hi!
        )
 
    [1] => Array
        (
            [0] => [ u ]you[ /u ]
            [1] => [ u ]you[ /u ]
            [2] => 
            [3] => you
        )
 
    [2] => Array
        (
            [0] => [ blink ]dude[ /blink ]
            [1] => [ blink ]dude[ /blink ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => dude
        )
 
    [3] => Array
        (
            [0] => [ size=3 ]ok write me back[ /size ]
            [1] => [ size=3 ]ok write me back[ /size ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 3
            [11] => ok write me back
        )
 
    [4] => Array
        (
            [0] => [ quote ]something[ /quote ]
            [1] => [ quote ]something[ /quote ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 
            [13] => 
            [14] => 
            [15] => 
            [16] => 
            [17] => something
        )
 
    [5] => Array
        (
            [0] => [ quote=scott ]something else[ /quote ]
            [1] => [ quote=scott ]something else[ /quote ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 
            [13] => 
            [14] => 
            [15] => 
            [16] => =scott
            [17] => something else
        )
 
)
I am using PREG_SET_ORDER.

EDIT| I had to space the bbcode out or else the forum would parse it.

Re: preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 3:42 am
by prometheuzz
scottayy wrote:Sure. I'm making a bbcode parser

My full pattern ends up being:

Code: Select all

/(\[b\](.+?)\[\/b\]|\[u\](.+?)\[\/u\]|\[i\](.+?)\[\/i\]|\[s\](.+?)\[\/s\]|\[img\](.+?)\[\/img\]|\[center\](.+?)\[\/center\]|\[marquee\](.+?)\[\/marquee\]|\[blink\](.+?)\[\/blink\]|\[size=(.+?)\](.+?)\[\/size\]|\[color=(.+?)\](.+?)\[\/color\]|\[url(=.+?)?\](.+?)\[\/url\]|\[quote(=.+?)?\](.+?)\[\/quote\])/ism
...
Okay, the reason you're getting empty strings in your $matches is because of (sub) regex-es like these: (=.+?)?
Since you make them reluctnat, there can be times that that specific (sub) regex does not match a part of your string. When that occurs, you will end up with an empty string in your $matches. There's no way around that.

A couple of observations about your current approach:
- creating a parser solely using regex is going to be hard since the recursive nature of many languages/grammars;
- there's no need to start and end your regex with parenthesis;
- cramming your entire regex pattern in one huge string is going to be a maintenance nightmare, at least use the x-modifier and divide your sub-regex-es on separate lines and indent is nicely;
- since you're also matching for the slashes in your pattern, use a different delimiter for your regex. Like the character '@'.

Something like this:

Code: Select all

$regex = '@
     \[b\]             (.+?)  \[/b\]
  |  \[u\]             (.+?)  \[/u\]
  |  \[i\]             (.+?)  \[/i\]
  |  \[s\]             (.+?)  \[/s\]
  |  \[img\]           (.+?)  \[/img\]
  |  \[center\]        (.+?)  \[/center\]
  |  \[marquee\]       (.+?)  \[/marquee\]
  |  \[blink\]         (.+?)  \[/blink\]
  |  \[size=(.+?)\]    (.+?)  \[/size\]
  |  \[color=(.+?)\]   (.+?)  \[/color\]
  |  \[url(=.+?)?\]    (.+?)  \[/url\]
  |  \[quote(=.+?)?\]  (.+?)  \[/quote\]
@isx'; // no need for the m-modifier

Re: preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 11:07 am
by s.dot
The pattern is dynamically generated so maintenance isn't an issue.
So basically, using this approach there's no way to avoid the empty matches. I use array_map('array_filter', $matches); to remove the empty entries but the keys aren't renumbered. Is there an easy way to renumber array keys?

Re: preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 11:52 am
by prometheuzz
You could match the two "types" of matches in two steps: http://pastebin.com/f424fa913 (externally posted because of the forum eating up the tags)

Re: preg_match_all().. empty values in $matches

Posted: Thu Jan 22, 2009 4:19 pm
by s.dot
There's actually 3 types.

[ tag ]
[ tag=neededvaluehere ]
[ tag=optionalvaluehere ]

But looking at your regex example is very helpful! I had tried using $1 and it didn't work for me.. i guess \1 is what I was looking for.

Re: preg_match_all().. empty values in $matches

Posted: Fri Jan 23, 2009 1:12 am
by prometheuzz
scottayy wrote:There's actually 3 types.

[ tag ]
[ tag=neededvaluehere ]
[ tag=optionalvaluehere ]
Ah, yes, didn't notice that...
scottayy wrote:But looking at your regex example is very helpful! I had tried using $1 and it didn't work for me.. i guess \1 is what I was looking for.
Good. You realise what went wrong with your original idea, right? When matching a string with the regex:

Code: Select all

'/(a)|(b)|(c)/'
and the 'c' is matched, the groups 1 and 2 will be empty. This, and my earlier observation of the reluctant groups, causes your empty matches.

Good luck.