Modifying a regexp.
Posted: Thu Jul 05, 2007 5:06 am
Hi,
I have been searching the forums for a regular expressions solution to extract links from a page, and I have found this great expression in this forum:
This returns all the links in a page, and works without a problem. However I need to tweak it slightly, so it returns only links that start with "/product_url?q="
so for examploe a link such as
<a href="/product_url?q=something">Link Title</a>
Would be extracted.
I have tried several things, none of which are producing any matches, but the above expression doesn't really mean much to me, i'm only just comfortable with the much simpler expressions. Anny help with this would be a great help.
Muzz
I have been searching the forums for a regular expressions solution to extract links from a page, and I have found this great expression in this forum:
Code: Select all
#<\s*a\s+(?:[A-Za-z0-9_]+(?:\s*=\s*(["\']?)(?:[^\\1]*?)\\1)?)*?\s*href\s*=\s*(["\']?)([^\\2]*?)\\2.*?>#so for examploe a link such as
<a href="/product_url?q=something">Link Title</a>
Would be extracted.
I have tried several things, none of which are producing any matches, but the above expression doesn't really mean much to me, i'm only just comfortable with the much simpler expressions. Anny help with this would be a great help.
Muzz