Page 1 of 1

Automatically parsing URLs problem

Posted: Mon Apr 05, 2004 8:59 am
by JayBird
okay,

I am using the code from phpBB to automatically parse URLs and emails in news stories that can be added to a site. The code is as follows:

Code: Select all

function make_clickable($text)
{

	// pad it with a space so we can match things at the start of the 1st line.
	$ret = ' ' . $text;

	// matches an "xxxx://yyyy" URL at the start of a line, or after a space.
	// xxxx can only be alpha characters.
	// yyyy is anything up to the first space, newline, comma, double quote or <
	$ret = preg_replace("#(^|[\n ])([\w]+?://[^ "\n\r\t<]*)#is", "\\1<a href="\\2" target="_blank">\\2</a>", $ret);

	// matches a "www|ftp.xxxx.yyyy[/zzzz]" kinda lazy URL thing
	// Must contain at least 2 dots. xxxx contains either alphanum, or "-"
	// zzzz is optional.. will contain everything up to the first space, newline, 
	// comma, double quote or <.
	$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ "\t\n\r<]*)#is", "\\1<a href="http://\\2" target="_blank">\\2</a>", $ret);

	// matches an email@domain type address at the start of a line, or after a space.
	// Note: Only the followed chars are valid; alphanums, "-", "_" and or ".".
	$ret = preg_replace("#(^|[\n ])([a-z0-9&\-_.]+?)@([\w\-]+\.([\w\-\.]+\.)*[\w]+)#i", "\\1<a href="mailto:\\2@\\3">\\2@\\3</a>", $ret);

	// Remove our padding..
	$ret = substr($ret, 1);

	return($ret);
}
This all works fine if the URL isn't at the end of a sentence followed by a full stop.

this is the out put i get
blah blah blah blah http://www.somesite.com.

when what i actaully want is
blah blah blah blah http://www.somesite.com.

Notice the postion of the full stop in the first example, it is part of the link.

How can i change the code above to allow for this?

Thanks

Mark

Posted: Mon Apr 05, 2004 5:13 pm
by Ixplodestuff8
Regular expressions aren't exactly my cup of tea but maybe this could work:

Code: Select all

<?php
$ret = preg_replace("#(^|[\n ])([\w]+?://[^ \."\n\r\t<]*\.[^ \."\t\n\r<]*)#is", "\\1<a href="\\2" target="_blank">\\2</a>", $ret); 

$ret = preg_replace("#(^|[\n ])((www|ftp)\.[^ \."\t\n\r<]*\.[^ \."\t\n\r<]*)#is", "\\1<a href="http://\\2" target="_blank">\\2</a>", $ret); 
?>
all I did is add a \. into the part in brackets after the http:// and www|ftp, then added another \. after the bracket and cut and paste the bracket back on the end. I didn't actually test this so hopefully it'll work :p

Edit: Now that I think about it, even if the expression works it would cause problems with all the variety of urls. (ex. http://www.google.com would not work since it would stop looking after google and drop the .com part.