Automatically parsing URLs problem

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Automatically parsing URLs problem

Post by JayBird »

okay,

I am using the code from phpBB to automatically parse URLs and emails in news stories that can be added to a site. The code is as follows:

Code: Select all

function make_clickable($text)
{

	// pad it with a space so we can match things at the start of the 1st line.
	$ret = ' ' . $text;

	// matches an "xxxx://yyyy" URL at the start of a line, or after a space.
	// xxxx can only be alpha characters.
	// yyyy is anything up to the first space, newline, comma, double quote or <
	$ret = preg_replace("#(^|[\n ])([\w]+?://[^ "\n\r\t<]*)#is", "\\1<a href="\\2" target="_blank">\\2</a>", $ret);

	// matches a "www|ftp.xxxx.yyyy[/zzzz]" kinda lazy URL thing
	// Must contain at least 2 dots. xxxx contains either alphanum, or "-"
	// zzzz is optional.. will contain everything up to the first space, newline, 
	// comma, double quote or <.
	$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ "\t\n\r<]*)#is", "\\1<a href="http://\\2" target="_blank">\\2</a>", $ret);

	// matches an email@domain type address at the start of a line, or after a space.
	// Note: Only the followed chars are valid; alphanums, "-", "_" and or ".".
	$ret = preg_replace("#(^|[\n ])([a-z0-9&\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1<a href="mailto:\\2@\\3">\\2@\\3</a>", $ret);

	// Remove our padding..
	$ret = substr($ret, 1);

	return($ret);
}
This all works fine if the URL isn't at the end of a sentence followed by a full stop.

this is the out put i get
blah blah blah blah http://www.somesite.com.

when what i actaully want is
blah blah blah blah http://www.somesite.com.

Notice the postion of the full stop in the first example, it is part of the link.

How can i change the code above to allow for this?

Thanks

Mark
User avatar
Ixplodestuff8
Forum Commoner
Posts: 60
Joined: Mon Feb 09, 2004 8:17 pm
Location: Queens, New York

Post by Ixplodestuff8 »

Regular expressions aren't exactly my cup of tea but maybe this could work:

Code: Select all

<?php
$ret = preg_replace("#(^|[\n ])([\w]+?://[^ \."\n\r\t<]*\.[^ \."\t\n\r<]*)#is", "\\1<a href="\\2" target="_blank">\\2</a>", $ret); 

$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \."\t\n\r<]*\.[^ \."\t\n\r<]*)#is", "\\1<a href="http://\\2" target="_blank">\\2</a>", $ret); 
?>
all I did is add a \. into the part in brackets after the http:// and www|ftp, then added another \. after the bracket and cut and paste the bracket back on the end. I didn't actually test this so hopefully it'll work :p

Edit: Now that I think about it, even if the expression works it would cause problems with all the variety of urls. (ex. http://www.google.com would not work since it would stop looking after google and drop the .com part.
Post Reply