Page 1 of 1

match until character, requiring a different character

Posted: Wed Jan 12, 2011 12:45 pm
by dmikester1
I have the basics of regex down fairly well. But I have this issue that is too complicated for me to figure out. I need regex to match until a '<' character requiring a ',' is in the match. And then I need regex to match until a '<' requiring a ',' is not in the match. Can someone help me? Those characters are a single left angle bracket and a single comma.
Thanks
Mike

Re: match until character, requiring a different character

Posted: Wed Jan 12, 2011 2:16 pm
by McInfo
Can you give some sample inputs and outputs?

Re: match until character, requiring a different character

Posted: Wed Jan 12, 2011 2:24 pm
by dmikester1
Sure.

input*: <td height="17" class="table_borders_left"><p class="phone_table_body">BOARDROOM</p></td>
output: BOARDROOM *should require that no comma be present

input*: <td height="17" class="table_borders_left"><p class="phone_table_body">MAIN PLANT - East Conf. Room</p></td>
output: MAIN PLANT - East Conf. Room *should require that no comma be present

input*: <td height="17" class="table_borders_left"><p class="phone_table_body">Smith, Brent</p></td>
output: Smith, Brent *should require a comma be present

So I need two regexes. One requiring a comma be present as in the last example, and one requiring a comma not be present as in the first two.

Thanks!
Mike

Re: match until character, requiring a different character

Posted: Wed Jan 12, 2011 3:34 pm
by McInfo
If the examples you posted actually are three individual inputs, the best way to isolate the output of each might be to use strip_tags() to remove the HTML and strpos() to test for the presence of a comma.

If all three inputs are part of a larger, single body of HTML, or if you just prefer using regex, the rest of this post might be helpful.

A pattern like the one shown below will grab content from between the tags. Here, <tags></tags> represents the literal HTML that the target strings have in common. I've just condensed the tags to make illustrating the pattern easier. Between the tags is a subpattern that captures one or more characters that are not less-than signs. The subpattern stops capturing when the less-than sign of the first closing tag is encountered.

Code: Select all

~<tags>([^<]+)</tags>~
To match only strings that do not contain commas, include a comma in the (negated) character range of the subpattern.

Code: Select all

([^<,]+)
To match only strings that do contain commas, use this next subpattern. It matches strings that consist of zero or more characters that are not less-than signs, followed by one or more commas, followed by zero or more characters that are not less-than signs.

Code: Select all

([^<]*,+[^<]*)

Re: match until character, requiring a different character

Posted: Wed Jan 12, 2011 4:05 pm
by dmikester1
Aw yes, much simpler than I was making it out to be.
Thank you very much McInfo!
Mike


p.s. I can't find anywhere to mark "solved"