preg_match_all().. empty values in $matches

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

preg_match_all().. empty values in $matches

Post by s.dot »

I'm using the following pattern which wants to catch certain strings between different delimiters..

Code: Select all

 
$pattern = '/(' . $char1Start . '(.+?)' . $char1End . '|' . $char2Start . '(.+?)' . $char2End . ')/ism
Which would give me something like this:

Code: Select all

/(\[#\](.+?)\[\/#\]|\[*\](.+?)[\/*\])/ism
Then I use preg_match_all($pattern, $text, $matches);

The problem is I'm using the | (or) character. and if the second condition is met, I get empty array values in $matches for the first () and (.+?) that comes before the |.

Do I have to live with this and just array_filter() $matches when I'm done? Cuz I have a lot of empty values in my $matches array since I'm doing like 30 different |'s.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_match_all().. empty values in $matches

Post by prometheuzz »

Can you also post the string that produces these empty entries?
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: preg_match_all().. empty values in $matches

Post by s.dot »

Sure. I'm making a bbcode parser

My full pattern ends up being:

Code: Select all

/(\[b\](.+?)\[\/b\]|\[u\](.+?)\[\/u\]|\[i\](.+?)\[\/i\]|\[s\](.+?)\[\/s\]|\[img\](.+?)\[\/img\]|\[center\](.+?)\[\/center\]|\[marquee\](.+?)\[\/marquee\]|\[blink\](.+?)\[\/blink\]|\[size=(.+?)\](.+?)\[\/size\]|\[color=(.+?)\](.+?)\[\/color\]|\[url(=.+?)?\](.+?)\[\/url\]|\[quote(=.+?)?\](.+?)\[\/quote\])/ism
The text I'm using to match upon:

Code: Select all

[ b ]hi![ /b ] what\'s up with [ u ]you[ /u ], [ blink ]dude[ /blink ]? [ size=3 ]ok write me back[ /size ] [ quote ]something[ /quote ] [ quote=scott ]something else[ /quote ]
 
And the results

Code: Select all

Array
(
    [0] => Array
        (
            [0] => [ b ]hi![ /b ]
            [1] => [ b ]hi![ /b ]
            [2] => hi!
        )
 
    [1] => Array
        (
            [0] => [ u ]you[ /u ]
            [1] => [ u ]you[ /u ]
            [2] => 
            [3] => you
        )
 
    [2] => Array
        (
            [0] => [ blink ]dude[ /blink ]
            [1] => [ blink ]dude[ /blink ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => dude
        )
 
    [3] => Array
        (
            [0] => [ size=3 ]ok write me back[ /size ]
            [1] => [ size=3 ]ok write me back[ /size ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 3
            [11] => ok write me back
        )
 
    [4] => Array
        (
            [0] => [ quote ]something[ /quote ]
            [1] => [ quote ]something[ /quote ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 
            [13] => 
            [14] => 
            [15] => 
            [16] => 
            [17] => something
        )
 
    [5] => Array
        (
            [0] => [ quote=scott ]something else[ /quote ]
            [1] => [ quote=scott ]something else[ /quote ]
            [2] => 
            [3] => 
            [4] => 
            [5] => 
            [6] => 
            [7] => 
            [8] => 
            [9] => 
            [10] => 
            [11] => 
            [12] => 
            [13] => 
            [14] => 
            [15] => 
            [16] => =scott
            [17] => something else
        )
 
)
I am using PREG_SET_ORDER.

EDIT| I had to space the bbcode out or else the forum would parse it.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_match_all().. empty values in $matches

Post by prometheuzz »

scottayy wrote:Sure. I'm making a bbcode parser

My full pattern ends up being:

Code: Select all

/(\[b\](.+?)\[\/b\]|\[u\](.+?)\[\/u\]|\[i\](.+?)\[\/i\]|\[s\](.+?)\[\/s\]|\[img\](.+?)\[\/img\]|\[center\](.+?)\[\/center\]|\[marquee\](.+?)\[\/marquee\]|\[blink\](.+?)\[\/blink\]|\[size=(.+?)\](.+?)\[\/size\]|\[color=(.+?)\](.+?)\[\/color\]|\[url(=.+?)?\](.+?)\[\/url\]|\[quote(=.+?)?\](.+?)\[\/quote\])/ism
...
Okay, the reason you're getting empty strings in your $matches is because of (sub) regex-es like these: (=.+?)?
Since you make them reluctnat, there can be times that that specific (sub) regex does not match a part of your string. When that occurs, you will end up with an empty string in your $matches. There's no way around that.

A couple of observations about your current approach:
- creating a parser solely using regex is going to be hard since the recursive nature of many languages/grammars;
- there's no need to start and end your regex with parenthesis;
- cramming your entire regex pattern in one huge string is going to be a maintenance nightmare, at least use the x-modifier and divide your sub-regex-es on separate lines and indent is nicely;
- since you're also matching for the slashes in your pattern, use a different delimiter for your regex. Like the character '@'.

Something like this:

Code: Select all

$regex = '@
     \[b\]             (.+?)  \[/b\]
  |  \[u\]             (.+?)  \[/u\]
  |  \[i\]             (.+?)  \[/i\]
  |  \[s\]             (.+?)  \[/s\]
  |  \[img\]           (.+?)  \[/img\]
  |  \[center\]        (.+?)  \[/center\]
  |  \[marquee\]       (.+?)  \[/marquee\]
  |  \[blink\]         (.+?)  \[/blink\]
  |  \[size=(.+?)\]    (.+?)  \[/size\]
  |  \[color=(.+?)\]   (.+?)  \[/color\]
  |  \[url(=.+?)?\]    (.+?)  \[/url\]
  |  \[quote(=.+?)?\]  (.+?)  \[/quote\]
@isx'; // no need for the m-modifier
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: preg_match_all().. empty values in $matches

Post by s.dot »

The pattern is dynamically generated so maintenance isn't an issue.
So basically, using this approach there's no way to avoid the empty matches. I use array_map('array_filter', $matches); to remove the empty entries but the keys aren't renumbered. Is there an easy way to renumber array keys?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_match_all().. empty values in $matches

Post by prometheuzz »

You could match the two "types" of matches in two steps: http://pastebin.com/f424fa913 (externally posted because of the forum eating up the tags)
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: preg_match_all().. empty values in $matches

Post by s.dot »

There's actually 3 types.

[ tag ]
[ tag=neededvaluehere ]
[ tag=optionalvaluehere ]

But looking at your regex example is very helpful! I had tried using $1 and it didn't work for me.. i guess \1 is what I was looking for.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: preg_match_all().. empty values in $matches

Post by prometheuzz »

scottayy wrote:There's actually 3 types.

[ tag ]
[ tag=neededvaluehere ]
[ tag=optionalvaluehere ]
Ah, yes, didn't notice that...
scottayy wrote:But looking at your regex example is very helpful! I had tried using $1 and it didn't work for me.. i guess \1 is what I was looking for.
Good. You realise what went wrong with your original idea, right? When matching a string with the regex:

Code: Select all

'/(a)|(b)|(c)/'
and the 'c' is matched, the groups 1 and 2 will be empty. This, and my earlier observation of the reluctant groups, causes your empty matches.

Good luck.
Post Reply