Regex help with html tags

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
hhappak
Forum Newbie
Posts: 1
Joined: Wed Sep 22, 2010 3:22 am

Regex help with html tags

Post by hhappak »

Hello,
I am having trouble with regex, which I need it to figure out if a <a> tag contains a link that points to a website outside the current website. For example,

<a href="http://www.example.com/login">link</a>

I need regex to check to see if the the "http://www.example.com" exists in the tag or not, if it doesnt then I need it to replace the whole tag with "" as a form to delete it. I have been trying to figure out how to do that but failed, please help me.

Edit-- The tags exist in a single string, so I need it to find <a> tags as well
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Regex help with html tags

Post by requinix »

Whee.

Code: Select all

<?php

function replacelink(array $matches) {
    $allowed = array(
        // order from longest first...
        "http://www.example.com"
        // ...to shortest last
    );

    list($text, , $islink, $href) = $matches; $href = trim($href, "\"'");
    if (!$islink) return $text;

    foreach ($allowed as $url) {
        if (strncasecmp($href, $url, strlen($url)) == 0) return $text;
    }
    return "";
}

$string = <<<HTML
Y <a>a</a>
Y <a name="anchor">anchor</a>
Y <a href="http://www.example.com">link</a>
N <a href="http://www.example.net">link</a>
Y <a href="http://www.example.com/login">link</a>
N <a href="http://www.example.net/login">link</a>
Y <a class="foo" href="http://www.example.com">link</a>
N <a class="foo" href="http://www.example.net">link</a>
Y <a class="foo" href="http://www.example.com/login">link</a>
N <a class="foo" href="http://www.example.net/login">link</a>
Y <a class="foo" href="http://www.example.com" id="bar">link</a>
N <a class="foo" href="http://www.example.net" id="bar">link</a>
Y <a class="foo" href="http://www.example.com/login" id="bar">link</a>
N <a class="foo" href="http://www.example.net/login" id="bar">link</a>

HTML;

$regex = '~<a\s*((href)\s*=\s*(["\'][^"\']*["\']|[^"\'\s]\S*)|(?!\s*href\s*=\s*)[^>"\']+|["\'][^"\']*["\']|\s+)*>(.*?)</a>~i';
echo preg_replace_callback($regex, "replacelink", $string);
Only replaces links (ie, <a> tags with an href). An excellent opportunity to use an anonymous function, if you have PHP 5.3+.
Post Reply