Check if string is contained by two other strings?
Posted: Mon Nov 16, 2009 4:35 pm
Hi everyone - new here so please bear with me. Wish I'd found this forum years ago though 
My question is a bit of a strange one, and I'm guessing it'll be some kind of regex, but regex isn't my strong point and there may be another easier way of doing this that I'm missing!
I'm using a regex that somebody else wrote for me to detect a particular pattern in a string, and need to know whether the matched string is contained within another pair of strings.
For arguments sake lets say i'm looking for an e-mail address amongst (properly formatted) HTML code. Once I've found the match I'd like to know whether its contained within a <p> and </p> tag. I.e., I need to treat these two cases differently:
some@address.com
<p>some@address.com</p>
That would be easy enough, but the problem comes that the HTML tags can be nested in any fashion, and the match may be found anywhere. So, I need to identify the following cases:
<p>My address is some@address.com</p>
<p><b>You can e-mail me at</b> <u>some@address.com</u></p>
but not:
<p>hello</p><u>some@address.com</u><p>another hello</p>
Hope this is making sense so far
I was thinking I could check for the <p> and </p> tags and compare their positions with the position of the regex match, but that would also match the last example as it is (textually) contained within a <p> and </p> tag, but it clearly isn't because of the closing </p> and second opening <p>
Any suggestions on the best way of doing this?
Thanks in advance
Ian
edit:
after writing all that it occured to me that converting it to an XML document, finding the match and checking parent nodes for the tags might work? But, that wouldn't work if the document wasn't HTML/XML and would fail if the start/end tags were different (e.g., matching ABC between ! and #)
My question is a bit of a strange one, and I'm guessing it'll be some kind of regex, but regex isn't my strong point and there may be another easier way of doing this that I'm missing!
I'm using a regex that somebody else wrote for me to detect a particular pattern in a string, and need to know whether the matched string is contained within another pair of strings.
For arguments sake lets say i'm looking for an e-mail address amongst (properly formatted) HTML code. Once I've found the match I'd like to know whether its contained within a <p> and </p> tag. I.e., I need to treat these two cases differently:
some@address.com
<p>some@address.com</p>
That would be easy enough, but the problem comes that the HTML tags can be nested in any fashion, and the match may be found anywhere. So, I need to identify the following cases:
<p>My address is some@address.com</p>
<p><b>You can e-mail me at</b> <u>some@address.com</u></p>
but not:
<p>hello</p><u>some@address.com</u><p>another hello</p>
Hope this is making sense so far
Any suggestions on the best way of doing this?
Thanks in advance
Ian
edit:
after writing all that it occured to me that converting it to an XML document, finding the match and checking parent nodes for the tags might work? But, that wouldn't work if the document wasn't HTML/XML and would fail if the start/end tags were different (e.g., matching ABC between ! and #)