Page 1 of 1

regex function to test full urls

Posted: Fri Nov 24, 2006 5:47 am
by amir
I've got a string entered by users, which can contain <a> tags - I already do all kinds of error checking and just needs code for the last bit.

I need to verify that the href="" property of each <a> tag begins with http:// please help me with such a function

Thanks

Posted: Fri Nov 24, 2006 5:52 am
by Chris Corbyn

Code: Select all

echo preg_replace('~<a\s+[^>]*href="(?!http://).*?>(.*?)</a>~is', '$1', $string);
Maybe.... (untested)

EDIT | If it works, that would turn a string like:

Code: Select all

Go to <a class="foo" href="ftp://bad-site.tld">my ftp site</a> and download lots of bad things
Into

Code: Select all

Go to my ftp site and download lots of bad things

Posted: Fri Nov 24, 2006 7:21 am
by Mordred
*ahem*

Code: Select all

$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href="ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';
Generally, parsing html is a pain in the smurf, there's lots and lots of things that can go wrong - if your sutuation allows it, it is better to use a "safe" replacement like bbcode.

Posted: Fri Nov 24, 2006 7:43 am
by Chris Corbyn
Ok smart-ass :P

Code: Select all

<?php

$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href="ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';

while (preg_match('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', $string))
    $string = preg_replace('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', '$1', $string);

echo $string;

Posted: Fri Nov 24, 2006 8:52 am
by Mordred
I like a game when I see one ;)

Code: Select all

<?php

$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href= "ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';

while (preg_match('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', $string))
    $string = preg_replace('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', '$1', $string);

echo $string;


?>
Next!

Posted: Fri Nov 24, 2006 10:16 am
by Chris Corbyn
:lol: I'm not even gonna bother. It just needs a few strategically placed \s* in there :)

Posted: Fri Nov 24, 2006 10:41 am
by Mordred
Also quotes, also checks against javascript.

Also, in the specific case with ftp, browsers being the smart things that they are ;) will automatically use the FTP protocol with urls like ftp.opera.com (but that's just trivia, nothing to do with the OP)

Actually I have a similar problem with "native" html. I'm using an old piece of code written for the previous version of the soft I'm rewriting, but it's not without problems. I've decided to try HTML Purifier, but haven't found time about it yet.

Posted: Fri Nov 24, 2006 10:54 am
by Chris Corbyn
We're using HTMLPurifier on our project. Haven't done much digging at the API since you can basically just use it out of the box but it seems to work well with the default settings anyhow :)