regex function to test full urls

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
amir
Forum Contributor
Posts: 287
Joined: Sat Oct 07, 2006 4:28 pm

regex function to test full urls

Post by amir »

I've got a string entered by users, which can contain <a> tags - I already do all kinds of error checking and just needs code for the last bit.

I need to verify that the href="" property of each <a> tag begins with http:// please help me with such a function

Thanks
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Code: Select all

echo preg_replace('~<a\s+[^>]*href="(?!http://).*?>(.*?)</a>~is', '$1', $string);
Maybe.... (untested)

EDIT | If it works, that would turn a string like:

Code: Select all

Go to <a class="foo" href="ftp://bad-site.tld">my ftp site</a> and download lots of bad things
Into

Code: Select all

Go to my ftp site and download lots of bad things
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

*ahem*

Code: Select all

$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href="ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';
Generally, parsing html is a pain in the smurf, there's lots and lots of things that can go wrong - if your sutuation allows it, it is better to use a "safe" replacement like bbcode.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

Ok smart-ass :P

Code: Select all

<?php

$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href="ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';

while (preg_match('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', $string))
    $string = preg_replace('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', '$1', $string);

echo $string;
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

I like a game when I see one ;)

Code: Select all

<?php

$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href= "ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';

while (preg_match('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', $string))
    $string = preg_replace('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', '$1', $string);

echo $string;


?>
Next!
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

:lol: I'm not even gonna bother. It just needs a few strategically placed \s* in there :)
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Also quotes, also checks against javascript.

Also, in the specific case with ftp, browsers being the smart things that they are ;) will automatically use the FTP protocol with urls like ftp.opera.com (but that's just trivia, nothing to do with the OP)

Actually I have a similar problem with "native" html. I'm using an old piece of code written for the previous version of the soft I'm rewriting, but it's not without problems. I've decided to try HTML Purifier, but haven't found time about it yet.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

We're using HTMLPurifier on our project. Haven't done much digging at the API since you can basically just use it out of the box but it seems to work well with the default settings anyhow :)
Post Reply