Page 1 of 1

Modifying a regexp.

Posted: Thu Jul 05, 2007 5:06 am
by mzfp2
Hi,

I have been searching the forums for a regular expressions solution to extract links from a page, and I have found this great expression in this forum:

Code: Select all

#<\s*a\s+(?:[A-Za-z0-9_]+(?:\s*=\s*(["\']?)(?:[^\\1]*?)\\1)?)*?\s*href\s*=\s*(["\']?)([^\\2]*?)\\2.*?>#
This returns all the links in a page, and works without a problem. However I need to tweak it slightly, so it returns only links that start with "/product_url?q="

so for examploe a link such as

<a href="/product_url?q=something">Link Title</a>

Would be extracted.

I have tried several things, none of which are producing any matches, but the above expression doesn't really mean much to me, i'm only just comfortable with the much simpler expressions. Anny help with this would be a great help.

Muzz

Posted: Thu Jul 05, 2007 7:26 am
by feyd
add "/product_url\?q=" just before "[^\\2]"

Posted: Thu Jul 05, 2007 10:32 am
by superdezign
Also, is it valid HTML to have a space between the opening less than sign and the tag name?