scottayy wrote:Sure. I'm making a bbcode parser
My full pattern ends up being:
Code: Select all
/(\[b\](.+?)\[\/b\]|\[u\](.+?)\[\/u\]|\[i\](.+?)\[\/i\]|\[s\](.+?)\[\/s\]|\[img\](.+?)\[\/img\]|\[center\](.+?)\[\/center\]|\[marquee\](.+?)\[\/marquee\]|\[blink\](.+?)\[\/blink\]|\[size=(.+?)\](.+?)\[\/size\]|\[color=(.+?)\](.+?)\[\/color\]|\[url(=.+?)?\](.+?)\[\/url\]|\[quote(=.+?)?\](.+?)\[\/quote\])/ism
...
Okay, the reason you're getting empty strings in your $matches is because of (sub) regex-es like these:
(=.+?)?
Since you make them reluctnat, there can be times that that specific (sub) regex does not match a part of your string. When that occurs, you will end up with an empty string in your $matches. There's no way around that.
A couple of observations about your current approach:
- creating a parser solely using regex is going to be hard since the recursive nature of many languages/grammars;
- there's no need to start and end your regex with parenthesis;
- cramming your entire regex pattern in one huge string is going to be a maintenance nightmare, at least use the x-modifier and divide your sub-regex-es on separate lines and indent is nicely;
- since you're also matching for the slashes in your pattern, use a different delimiter for your regex. Like the character '@'.
Something like this:
Code: Select all
$regex = '@
\[b\] (.+?) \[/b\]
| \[u\] (.+?) \[/u\]
| \[i\] (.+?) \[/i\]
| \[s\] (.+?) \[/s\]
| \[img\] (.+?) \[/img\]
| \[center\] (.+?) \[/center\]
| \[marquee\] (.+?) \[/marquee\]
| \[blink\] (.+?) \[/blink\]
| \[size=(.+?)\] (.+?) \[/size\]
| \[color=(.+?)\] (.+?) \[/color\]
| \[url(=.+?)?\] (.+?) \[/url\]
| \[quote(=.+?)?\] (.+?) \[/quote\]
@isx'; // no need for the m-modifier