Page 1 of 1
match until character, requiring a different character
Posted: Wed Jan 12, 2011 12:45 pm
by dmikester1
I have the basics of regex down fairly well. But I have this issue that is too complicated for me to figure out. I need regex to match until a '<' character requiring a ',' is in the match. And then I need regex to match until a '<' requiring a ',' is not in the match. Can someone help me? Those characters are a single left angle bracket and a single comma.
Thanks
Mike
Re: match until character, requiring a different character
Posted: Wed Jan 12, 2011 2:16 pm
by McInfo
Can you give some sample inputs and outputs?
Re: match until character, requiring a different character
Posted: Wed Jan 12, 2011 2:24 pm
by dmikester1
Sure.
input*: <td height="17" class="table_borders_left"><p class="phone_table_body">BOARDROOM</p></td>
output: BOARDROOM *should require that no comma be present
input*: <td height="17" class="table_borders_left"><p class="phone_table_body">MAIN PLANT - East Conf. Room</p></td>
output: MAIN PLANT - East Conf. Room *should require that no comma be present
input*: <td height="17" class="table_borders_left"><p class="phone_table_body">Smith, Brent</p></td>
output: Smith, Brent *should require a comma be present
So I need two regexes. One requiring a comma be present as in the last example, and one requiring a comma not be present as in the first two.
Thanks!
Mike
Re: match until character, requiring a different character
Posted: Wed Jan 12, 2011 3:34 pm
by McInfo
If the examples you posted actually are three individual inputs, the best way to isolate the output of each might be to use strip_tags() to remove the HTML and strpos() to test for the presence of a comma.
If all three inputs are part of a larger, single body of HTML, or if you just prefer using regex, the rest of this post might be helpful.
A pattern like the one shown below will grab content from between the tags. Here, <tags></tags> represents the literal HTML that the target strings have in common. I've just condensed the tags to make illustrating the pattern easier. Between the tags is a subpattern that captures one or more characters that are not less-than signs. The subpattern stops capturing when the less-than sign of the first closing tag is encountered.
To match only strings that do not contain commas, include a comma in the (negated) character range of the subpattern.
To match only strings that do contain commas, use this next subpattern. It matches strings that consist of zero or more characters that are not less-than signs, followed by one or more commas, followed by zero or more characters that are not less-than signs.
Re: match until character, requiring a different character
Posted: Wed Jan 12, 2011 4:05 pm
by dmikester1
Aw yes, much simpler than I was making it out to be.
Thank you very much McInfo!
Mike
p.s. I can't find anywhere to mark "solved"