regex preg_replace help: URL to a link

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

ninethousandfeet
Forum Contributor
Posts: 130
Joined: Tue Mar 10, 2009 4:56 pm

regex preg_replace help: URL to a link

Post by ninethousandfeet »

Hi,

I've come so close to a working solution to my problem, just need help to get the final little bugs out.

Users create comments, which can contain a reference to a link within the comment.

My current preg_replace to determine if there is a url so it can be turned to a link:

Code: Select all

 
$msg = preg_replace("/([^A-z0-9?])(http|ftp|https)([\:\/\/])([^\\s]+)/"," <a href=\"$2$3$4\" ref="nofollow">$2$3$4</a>",$msg);
 
Problems:
* example 1:
http://helloworld.com is a great site, check it out. --- does not convert url to a link, something to do with there not being any characters before the url

* example 2: check out http://helloworld.com, it's a great site! --- converts to link BUT the comma is part of the link so it won't work (this also happens if a parenthesis, dash, etc. are connected to the end of the url)

Any help with either or both of these problems would be great, thanks for taking a look!

Brad
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: regex preg_replace help: URL to a link

Post by tr0gd0rr »

In the first capturing pattern, you need to allow any white space OR beginning of string `(^|\s)`

In the last capturing pattern, you need to use only characters allowed unescaped in urls:
Something like `([a-zA-Z~!@$%&*()_+:?,./;'=#-]+)`

If you want to assume that the url does not end in certain punctuation (which it may), create another capturing pattern:
Something like `([a-zA-Z~@$%&*_+/'=])`

Also, checkout this tool for testing regexes: http://gskinner.com/RegExr/
ninethousandfeet
Forum Contributor
Posts: 130
Joined: Tue Mar 10, 2009 4:56 pm

Re: regex preg_replace help: URL to a link

Post by ninethousandfeet »

Thanks for your help... halfway got it.

This is what I have right now and the beginning space problem seems to be fixed.

I tried various versions of your ending capture code, but came up empty each time. I can send a bunch of the variations I tried if it will help. Can you help me with placement of the code you have written for the end of this preg_replace? (I like that validator regex site... it helped explain things a bit, but couldn't quite get it on there either)

I have:
/(^[A-z0-9]?|\s)(http|ftp|https)([\:\/\/])([^\\s]+)/

I tried replacing the last capturing pattern with your code, also tried to add it to what I already had in that capture pattern in the front and back, and none of it worked.

Thank you for your help!
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: regex preg_replace help: URL to a link

Post by tr0gd0rr »

The following is working for me on your example text:

Code: Select all

(^|\s)(http|ftp|https|mailto)([\:\/\/])([a-zA-Z~!@$%&*()_+:?,./;'=#-]{2,}[a-zA-Z~@$%&*_+/'=])
I may have thrown you off in my previous post because I used back-ticks instead of

Code: Select all

tags.
ninethousandfeet
Forum Contributor
Posts: 130
Joined: Tue Mar 10, 2009 4:56 pm

Re: regex preg_replace help: URL to a link

Post by ninethousandfeet »

For some reason, that still won't work. When I use your most recent option, the entire comment does not appear. Is it something with the replace portion maybe? Or do you think it's something else?

Code: Select all

 
$msg = preg_replace("/(^|\s)(http|ftp|https|mailto)([\:\/\/])([a-zA-Z~!@$%&*()_+:?,./;'=#-]{2,}[a-zA-Z~@$%&*_+/'=])/"," <a href=\"$2$3$4\" ref=\"nofollow\">$2$3$4</a> ",$msg);
 
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: regex preg_replace help: URL to a link

Post by tr0gd0rr »

Running your code I get the error `Warning: preg_replace() [function.preg-replace]: Unknown modifier ';'` because you need to escape the two slashes in the fourth capturing pattern.
ninethousandfeet
Forum Contributor
Posts: 130
Joined: Tue Mar 10, 2009 4:56 pm

Re: regex preg_replace help: URL to a link

Post by ninethousandfeet »

I've added \ in front of the two / in the 4th capture.

I then experienced a problem with they link being stopped if a number appeared.
Example of this problem:
http://bit.ly/r 7tPR

So, I fixed this by changing the two a-zA-Z ... to ... A-z0-9

The comma problem is fixed. I'm sure this is very rare, but what can I add to the 1st capture to ignore any characters before the http|ftp... Or maybe a better way to put it is to ignore those characters and start the link at the http...
Example user input:
- (http://me.com)
- hi, go to-http://me.com

Thanks for your help with all of this, regex is a whole new world for me.

Cheers,
Brad
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: regex preg_replace help: URL to a link

Post by tr0gd0rr »

Maybe use a \b instead of the first capturing pattern. \b indicates it must be a word break.

I think that \b is equivalent to (^|[^\w]) in this case, so you can try that too. BTW, \w is equal to [a-zA-Z0-9] and it could be used in the regexes above.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: regex preg_replace help: URL to a link

Post by s.dot »

I've used the one that came with phpbb a while back

Code: Select all

function make_clickable($text)
{
    $ret = ' ' . $text;
    $ret = preg_replace("#(^|[\n ])([\w]+?://[^ \"\n\r\t<]*)#is", "\\1<a href=\"\\2\" target=\"_blank\">\\2</a>", $ret);
    $ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \"\t\n\r<]*)#is", "\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>", $ret);
    $ret = preg_replace("#(^|[\n ])([a-z0-9&\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1<a href=\"mailto:\\2@\\3\">\\2@\\3</a>", $ret);
    $ret = substr($ret, 1);
        
    return $ret;
}
 
$text = make_clickable($text);
It has never failed me.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
ninethousandfeet
Forum Contributor
Posts: 130
Joined: Tue Mar 10, 2009 4:56 pm

Re: regex preg_replace help: URL to a link

Post by ninethousandfeet »

tr0gd0rr - that worked great. the only thing i can't figure out now is that when the user submits a new comment, i use javascript to display the new comment with older comments (just like a stream of comments on facebook for example).

the problem is that the punctuation before and after the link is part of the link. then if you click refresh, it is correct and the punctuation (,.() etc.) is not part of the link. any idea why this is happening? not a huge deal, but if it's fixable i'd love to fix it. i'll keep trying and post if i come across a solution. let me know if you can think of anything or if you need to see more of my code to help.

thank you!

s.dot - everything worked fine with your code except punctuation before and after would appear as part of the link the whole time. i like the cleanliness of the function. do you know how to overcome these problems? i.e. http://example.com, check it out ... the comma is included in the link, which causes a broken link.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: regex preg_replace help: URL to a link

Post by s.dot »

Just wanted to point out that it's not *my* code ;)

I guess this behavior is because commas and periods are part of valid URLs. Although, I have used this function on a popular forum before and I've never ran into any issues with users adding commas or periods or any other punctuation after posting URLs.. in fact I never knew this was an issue.

But since those characters are valid parts of URLs, you cannot ignore them. eg http://www.example.com/page/3,1,3, may be a valid URL.

However, If you wish to not include trailing punctuation characters, I don't know how to edit the regex to avoid them LOL, I'm admittedly not very good with regular expressions.

EDIT| And, that is weird that phpbb did not link the last comma in that URL I posted. Hmm, maybe the phpbb function I am using is outdated or I edited it at some point.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: regex preg_replace help: URL to a link

Post by s.dot »

Actually, here is the original function from the PHPBB 2.x forum code:

Code: Select all

function make_clickable($text)
{
 
   // pad it with a space so we can match things at the start of the 1st line.
   $ret = " " . $text;
 
   // matches an "xxxx://yyyy" URL at the start of a line, or after a space.
   // xxxx can only be alpha characters.
   // yyyy is anything up to the first space, newline, or comma.
   $ret = preg_replace("#([\n ])([a-z]+?)://([^,\t \n\r]+)#i", "\\1<a href=\"\\2://\\3\" target=\"_blank\">\\2://\\3</a>", $ret);
 
   // matches a "www.xxxx.yyyy[/zzzz]" kinda lazy URL thing
   // Must contain at least 2 dots. xxxx contains either alphanum, or "-"
   // yyyy contains either alphanum, "-", or "."
   // zzzz is optional.. will contain everything up to the first space, newline, or comma.
   // This is slightly restrictive - it's not going to match stuff like "forums.foo.com"
   // This is to keep it from getting annoying and matching stuff that's not meant to be a link.
   $ret = preg_replace("#([\n ])www\.([a-z0-9\-]+)\.([a-z0-9\-.\~]+)((?:/[^,\t \n\r]*)?)#i", "\\1<a href=\"http://www.\\2.\\3\\4\" target=\"_blank\">www.\\2.\\3\\4</a>", $ret);
 
   // matches an email@domain type address at the start of a line, or after a space.
   // Note: Only the followed chars are valid; alphanums, "-", "_" and or ".".
   $ret = preg_replace("#([\n ])([a-z0-9\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)?[\w]+)#i", "\\1<a href=\"mailto:\\2@\\3\">\\2@\\3</a>", $ret);
 
   // Remove our padding..
   $ret = substr($ret, 1);
 
   return($ret);
}
This does not link commas at the end, but still does periods.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: regex preg_replace help: URL to a link

Post by tr0gd0rr »

Yah with regexes you can go as simple or as complicated as you want. For your purposes, sticking with a widely-used method such as the phpBB function should do fine.
ninethousandfeet
Forum Contributor
Posts: 130
Joined: Tue Mar 10, 2009 4:56 pm

Re: regex preg_replace help: URL to a link

Post by ninethousandfeet »

I somewhat combined the function and the preg_replace that is working with the characters both in the beginning and end to get this:

Code: Select all

 
function make_clickable($msg)
{
    $ret = ' ' . $msg;
    $ret = preg_replace("/(\b)(http|ftp|https|mailto)([\:\/\/])([A-z0-9~!@$%&*()_+:?,.\/;'=#-]{2,}[A-z0-9~@$%&*_+\/'=])/","<a href=\"$2$3$4\" ref=\"nofollow\">$2$3$4</a>",$ret);
    $ret = substr($ret, 1);
       
    return $ret;
}
 
$msg = make_clickable($msg);
 

The above code works great when the user reloads the browser, the only problem is that when there new comment is submitted and displayed immediately using js, the characters will appear as part of the link. Any ideas how to make the link the same when the js loads the comment so the user does not have to reload the page to see it displayed properly?

I can provide additional code if this isn't sufficient. Thanks for both of your help with this!
User avatar
tr0gd0rr
Forum Contributor
Posts: 305
Joined: Thu May 11, 2006 8:58 pm
Location: Utah, USA

Re: regex preg_replace help: URL to a link

Post by tr0gd0rr »

Not sure what you mean about the JS. Is there a JS-driven preview feature? Is there a JS function that does the same thing as the PHP function?
Post Reply