Help with an html link regex <a href>

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
Kadanis
Forum Contributor
Posts: 180
Joined: Tue Jun 20, 2006 8:55 am
Location: Dorset, UK
Contact:

Help with an html link regex <a href>

Post by Kadanis »

Hi all

First off here's the current regex

Code: Select all

$linkPattern = '/<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>/mi';
Bascially it's used in the PHP preg_match_all function to find all the links in a piece of html and do something with them. Up until now it has worked perfectly.

However, today a new test was suggested (one stupidly overlooked) and it failed miserably. I was hoping you guys could help suggest a fix.

If

Code: Select all

<a href="http://your/link/here">Click</a>
or variants is used everything is fine, but (and a it's big but)

if

Code: Select all

<a href="http://your/link/here">
<any other html tag>
Click
</close any tag>
</a>
is used, basically having any tags inside the anchor tag, then the whole thing is skipped by the regex.

I have a basic grasp of regex, but to be honest I don't think this is one of mine. I'm now the only dev on the project so looking forward to suggestions and many thanks in advance to any one who can help.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

The dot metacharacter matches any character. However, by default it does not match newlines. I think that's what causes the problem in the second example. Try adding the s modifier.

http://php.net/manual/en/reference.pcre ... ifiers.php
regexpert
Forum Newbie
Posts: 7
Joined: Sat Oct 20, 2007 3:41 am

/s modifier is missing

Post by regexpert »

The correct pattern should be like this.

Code: Select all

$linkPattern = '/<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>/is';
Post Reply