Page 1 of 1

Match ALL urls

Posted: Fri Feb 02, 2007 10:57 am
by Skara
I know it's been done before, but I can't find anything that works for me. Here's what I've got so far:

Code: Select all

$message = preg_replace('#'
    . '(http://)?'
    . '('
    . '(([0-9a-z_!~*\'().&=+$%-]+: )?[0-9a-z_!~*\'().&=+$%-]+@)?' //user@
    . '(([0-9]{1,3}\.){3}[0-9]{1,3}' // IP- 199.194.52.184
    . '|' // allows either IP or domain
    . '([0-9a-z_!~*\'()-]+\.)*' // tertiary domain(s)- www.
    . '([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.' // second level domain
    . '[a-z]{2,6})' // first level domain- .com or .museum
    . '(:[0-9]{1,4})?' // port number- :80
    . '((/?)|' // a slash isn't required if there is no file name
    . '((/[0-9a-z_!~*\'().;?:@&=+$,%\#-]+)+/?)*?)'
    . ')#i',
    '<a href="http://$2" target="_blank">http://$2</a>',
    $message
);
I just pulled it off another site and modified it a little bit.

It's matching these:

Code: Select all

http://www.example.com
www.example.com
example.com
www.example.com/
http://example.com/
..and so on...
but for some reason it leaves out anything after that..

Code: Select all

example.com/folder
makes:
<a href="http://example.com/">http://example.com/</a>folder
Now, I can sorta fix this, via...

Code: Select all

//change this line:
    . '((/?)|' // a slash isn't required if there is no file name
//to this:
    . '((/?)' // a slash isn't required if there is no file name
Then, it only works on urls with folders and such, but not if it's plain jane.

Code: Select all

example.com/folder  <<works
example.com/  <<doesn't
So. I'm stuck.

I need something that's definitely going to work beautifully. If someone, say, does this:

Code: Select all

http://sub.example.com/blah/howdy.php?x=2&y=a%20b#here
it needs to work.

I only need http, though. ftp and whatnot I can leave out. It would be nice to add in https, but I can't seem to figure that out either. I mean, yeah, I can find it. Great. Problem is..
if http or https is typed, use it. else, insert http.
Which I translate to splitting it up into more than one regex.
I've tried that and failed miserably.

I need this to be as generic as possible--in other words, work on everything.


Anyone have a simpler solution?

By the way--I just realized :roll: --this board does this. Anyone know that code?

Posted: Fri Feb 02, 2007 11:53 am
by Skara
bwahaha 8)

I think I got it!

Code: Select all

$message = preg_replace('#'.
                                        '(http(s)?://)?'.
                                        '('.//$3
                                                '((\d{1,3}\.){4})'.    //ip
                                            '|'.
                                                '([0-9a-z_!~*\'()-]+\.)*'.                 //3ld
                                                '(([0-9a-z][0-9a-z-]{0,61})?[0-9a-z])\.'.  //sld
                                                '([a-z]{2,6})'.                            //tld
                                                '(:[0-9]{1,4})?'.                          //port
                                                '((/[0-9a-z_!~*\'().;\?:@&=+$,%\#-]+)+)'.  //folders,files,variables,links
                                        ')'.
                                    '#',
                                    '<a href="http$2://$3" target="_blank">http$2://$3</a>',
                                    $message
                                   );
            $message = preg_replace('#'.
                                        '(http(s)?://)?'.
                                        '('.//$3
                                                '((\d{1,3}\.){4})'.    //ip
                                            '|'.
                                                '([0-9a-z_!~*\'()-]+\.)*'.                 //3ld
                                                '(([0-9a-z][0-9a-z-]{0,61})?[0-9a-z])\.'.  //sld
                                                '([a-z]{2,6})'.                            //tld
                                                '(:[0-9]{1,4})?'.                          //port
                                                '(/?)'.                                    //trailing slash
                                                '(?!\S)'.
                                        ')'.
                                    '#',
                                    '<a href="http$2://$3" target="_blank">http$2://$3</a>',
                                    $message
                                   );

            $message = str_replace("http://http://","http://",$message);
            $message = str_replace("https://https://","https://",$message);
Hopefully this will work. Has so far.