Hello,
I am having trouble with regex, which I need it to figure out if a <a> tag contains a link that points to a website outside the current website. For example,
<a href="http://www.example.com/login">link</a>
I need regex to check to see if the the "http://www.example.com" exists in the tag or not, if it doesnt then I need it to replace the whole tag with "" as a form to delete it. I have been trying to figure out how to do that but failed, please help me.
Edit-- The tags exist in a single string, so I need it to find <a> tags as well
Regex help with html tags
Moderator: General Moderators
Re: Regex help with html tags
Whee.
Only replaces links (ie, <a> tags with an href). An excellent opportunity to use an anonymous function, if you have PHP 5.3+.
Code: Select all
<?php
function replacelink(array $matches) {
$allowed = array(
// order from longest first...
"http://www.example.com"
// ...to shortest last
);
list($text, , $islink, $href) = $matches; $href = trim($href, "\"'");
if (!$islink) return $text;
foreach ($allowed as $url) {
if (strncasecmp($href, $url, strlen($url)) == 0) return $text;
}
return "";
}
$string = <<<HTML
Y <a>a</a>
Y <a name="anchor">anchor</a>
Y <a href="http://www.example.com">link</a>
N <a href="http://www.example.net">link</a>
Y <a href="http://www.example.com/login">link</a>
N <a href="http://www.example.net/login">link</a>
Y <a class="foo" href="http://www.example.com">link</a>
N <a class="foo" href="http://www.example.net">link</a>
Y <a class="foo" href="http://www.example.com/login">link</a>
N <a class="foo" href="http://www.example.net/login">link</a>
Y <a class="foo" href="http://www.example.com" id="bar">link</a>
N <a class="foo" href="http://www.example.net" id="bar">link</a>
Y <a class="foo" href="http://www.example.com/login" id="bar">link</a>
N <a class="foo" href="http://www.example.net/login" id="bar">link</a>
HTML;
$regex = '~<a\s*((href)\s*=\s*(["\'][^"\']*["\']|[^"\'\s]\S*)|(?!\s*href\s*=\s*)[^>"\']+|["\'][^"\']*["\']|\s+)*>(.*?)</a>~i';
echo preg_replace_callback($regex, "replacelink", $string);