Modifying a regexp.

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
mzfp2
Forum Contributor
Posts: 137
Joined: Mon Nov 11, 2002 9:44 am
Location: UK
Contact:

Modifying a regexp.

Post by mzfp2 »

Hi,

I have been searching the forums for a regular expressions solution to extract links from a page, and I have found this great expression in this forum:

Code: Select all

#<\s*a\s+(?:[A-Za-z0-9_]+(?:\s*=\s*(["\']?)(?:[^\\1]*?)\\1)?)*?\s*href\s*=\s*(["\']?)([^\\2]*?)\\2.*?>#
This returns all the links in a page, and works without a problem. However I need to tweak it slightly, so it returns only links that start with "/product_url?q="

so for examploe a link such as

<a href="/product_url?q=something">Link Title</a>

Would be extracted.

I have tried several things, none of which are producing any matches, but the above expression doesn't really mean much to me, i'm only just comfortable with the much simpler expressions. Anny help with this would be a great help.

Muzz
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

add "/product_url\?q=" just before "[^\\2]"
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Also, is it valid HTML to have a space between the opening less than sign and the tag name?
Post Reply