Page 1 of 1
Split on strings possibly containing parentheses (quick-fix)
Posted: Wed Mar 07, 2007 3:01 am
by jamiew
Working with some text like:
Code: Select all
Blah blah blah, some more goes here (thanks, Bob), here is some more (via Boing Boing)
I'm trying to break on the "real" commas, e.g. the '(thanks, Bob)' should remain part of the 2nd match
I've pieced together a regex that matches everything, but excludes the parenthesized substrings:
Any ideas? Sure I'm just missing something simple! Bonus points if it can accommodate brackets [] as well as parantheses
Posted: Wed Mar 07, 2007 3:56 am
by Xoligy
You're going to have a hard time getting regexp to do that. The easiest solution I can think of is either replacing anything inside brackets with temporary markers, or looping through each character and creating a stack (eg. if the character is open bracket - add it to the stack, if the character is closed bracket - take the last item off the stack. If the character is comma - check if the stack is empty; if it is then split it. If not, keep going).
Posted: Wed Mar 07, 2007 8:53 am
by feyd
I'll have to agree with Xoligy here. Because of the limitations with zero width assertions (namely variable width) it's not simple to parse with a single regular expression. I would agree with the suggestion of a string parser.
Posted: Wed Mar 07, 2007 12:54 pm
by jamiew
Thanks for the input guys, gonna go the parser route. Got so close with my regex, was convinced there must be some little trick!
Posted: Fri Mar 09, 2007 8:11 am
by stereofrog
hi jamiew
you can also try the following
Code: Select all
$str = "your text...";
preg_match_all('/([^(,]|\(.*?\))+/', $str, $m);
print_r($m);
works if there's no nested parenthesis.