Page 1 of 1

Help with an html link regex <a href>

Posted: Tue Sep 04, 2007 12:34 pm
by Kadanis
Hi all

First off here's the current regex

Code: Select all

$linkPattern = '/<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>/mi';
Bascially it's used in the PHP preg_match_all function to find all the links in a piece of html and do something with them. Up until now it has worked perfectly.

However, today a new test was suggested (one stupidly overlooked) and it failed miserably. I was hoping you guys could help suggest a fix.

If

Code: Select all

<a href="http://your/link/here">Click</a>
or variants is used everything is fine, but (and a it's big but)

if

Code: Select all

<a href="http://your/link/here">
<any other html tag>
Click
</close any tag>
</a>
is used, basically having any tags inside the anchor tag, then the whole thing is skipped by the regex.

I have a basic grasp of regex, but to be honest I don't think this is one of mine. I'm now the only dev on the project so looking forward to suggestions and many thanks in advance to any one who can help.

Posted: Fri Sep 07, 2007 6:49 am
by GeertDD
The dot metacharacter matches any character. However, by default it does not match newlines. I think that's what causes the problem in the second example. Try adding the s modifier.

http://php.net/manual/en/reference.pcre ... ifiers.php

/s modifier is missing

Posted: Sat Oct 20, 2007 4:03 am
by regexpert
The correct pattern should be like this.

Code: Select all

$linkPattern = '/<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>/is';