Split on strings possibly containing parentheses (quick-fix)

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
jamiew
Forum Newbie
Posts: 2
Joined: Wed Mar 07, 2007 2:49 am

Split on strings possibly containing parentheses (quick-fix)

Post by jamiew »

Working with some text like:

Code: Select all

Blah blah blah, some more goes here (thanks, Bob), here is some more (via Boing Boing)
I'm trying to break on the "real" commas, e.g. the '(thanks, Bob)' should remain part of the 2nd match

I've pieced together a regex that matches everything, but excludes the parenthesized substrings:

Code: Select all

/(\(.+?\))*?\,/
Any ideas? Sure I'm just missing something simple! Bonus points if it can accommodate brackets [] as well as parantheses
Xoligy
Forum Commoner
Posts: 53
Joined: Sun Mar 04, 2007 5:35 am

Post by Xoligy »

You're going to have a hard time getting regexp to do that. The easiest solution I can think of is either replacing anything inside brackets with temporary markers, or looping through each character and creating a stack (eg. if the character is open bracket - add it to the stack, if the character is closed bracket - take the last item off the stack. If the character is comma - check if the stack is empty; if it is then split it. If not, keep going).
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I'll have to agree with Xoligy here. Because of the limitations with zero width assertions (namely variable width) it's not simple to parse with a single regular expression. I would agree with the suggestion of a string parser.
jamiew
Forum Newbie
Posts: 2
Joined: Wed Mar 07, 2007 2:49 am

Post by jamiew »

Thanks for the input guys, gonna go the parser route. Got so close with my regex, was convinced there must be some little trick!
User avatar
stereofrog
Forum Contributor
Posts: 386
Joined: Mon Dec 04, 2006 6:10 am

Post by stereofrog »

hi jamiew

you can also try the following

Code: Select all

$str = "your text...";
preg_match_all('/([^(,]|\(.*?\))+/', $str, $m);
print_r($m);
works if there's no nested parenthesis.
Post Reply