Page 1 of 1
regex function to test full urls
Posted: Fri Nov 24, 2006 5:47 am
by amir
I've got a string entered by users, which can contain <a> tags - I already do all kinds of error checking and just needs code for the last bit.
I need to verify that the href="" property of each <a> tag begins with http:// please help me with such a function
Thanks
Posted: Fri Nov 24, 2006 5:52 am
by Chris Corbyn
Code: Select all
echo preg_replace('~<a\s+[^>]*href="(?!http://).*?>(.*?)</a>~is', '$1', $string);
Maybe.... (untested)
EDIT | If it works, that would turn a string like:
Code: Select all
Go to <a class="foo" href="ftp://bad-site.tld">my ftp site</a> and download lots of bad things
Into
Code: Select all
Go to my ftp site and download lots of bad things
Posted: Fri Nov 24, 2006 7:21 am
by Mordred
*ahem*
Code: Select all
$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href="ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';
Generally, parsing html is a pain in the smurf, there's lots and lots of things that can go wrong - if your sutuation allows it, it is better to use a "safe" replacement like bbcode.
Posted: Fri Nov 24, 2006 7:43 am
by Chris Corbyn
Ok smart-ass
Code: Select all
<?php
$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href="ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';
while (preg_match('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', $string))
$string = preg_replace('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', '$1', $string);
echo $string;
Posted: Fri Nov 24, 2006 8:52 am
by Mordred
I like a game when I see one
Code: Select all
<?php
$string = 'Go to <a class="foo" href="ftp://bad-site.tld"><a href= "ftp://bad-site.tld">my ftp site</ a></a> and download lots of bad things';
while (preg_match('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', $string))
$string = preg_replace('~<a\s+[^>]*?href="(?!http://).*?>(.*?)</\s*a>~is', '$1', $string);
echo $string;
?>
Next!
Posted: Fri Nov 24, 2006 10:16 am
by Chris Corbyn

I'm not even gonna bother. It just needs a few strategically placed
\s* in there

Posted: Fri Nov 24, 2006 10:41 am
by Mordred
Also quotes, also checks against javascript.
Also, in the specific case with ftp, browsers being the smart things that they are

will automatically use the FTP protocol with urls like ftp.opera.com (but that's just trivia, nothing to do with the OP)
Actually I have a similar problem with "native" html. I'm using an old piece of code written for the previous version of the soft I'm rewriting, but it's not without problems. I've decided to try
HTML Purifier, but haven't found time about it yet.
Posted: Fri Nov 24, 2006 10:54 am
by Chris Corbyn
We're using HTMLPurifier on our project. Haven't done much digging at the API since you can basically just use it out of the box but it seems to work well with the default settings anyhow
