Page 1 of 1

Regex for URLs issue

Posted: Wed Aug 04, 2010 8:30 pm
by HiddenS3crets
I'm trying to write a regular expression to strip any URLs from a bigger string

Example:
Given the string "check out google.com and also yahoo.com"

The regex will return google.com and yahoo.com

My current regex is

Code: Select all

<?php
$str = "check out google.com and also yahoo.com";

preg_match_all('/(?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*(?:[[:space:]]|$)/Ui', $str, $matches);
?>
The regex works fine, but when it matches a URL in the string, it stores it in $matches with the space at the end (e.g. 'google.com ')

Is there a way to tell the regex to not save the space in $matches?

Re: Regex for URLs issue

Posted: Wed Aug 04, 2010 8:34 pm
by superdezign
You could try making one large group of everything but the characters that you want and extract that from the array rather than the entire match.

Re: Regex for URLs issue

Posted: Wed Aug 04, 2010 8:38 pm
by HiddenS3crets
I'm not really following...

I know I could get rid of the space by doing something like this:

Code: Select all

<?php
foreach($matches[0] as $url)
{
  if(substr($url, -1) == ' ') $url = substr($url, 0, strlen($url) - 1);
}
?>
But it seems like overkill if there's actually a way to just tell [[:space:]] to be saved as part of each match

Re: Regex for URLs issue

Posted: Wed Aug 04, 2010 8:40 pm
by superdezign
preg_match_all saves all matched pieces. I sounds like the OP is trying to use the first match of the array, which is the full string. Wha they want to do is create a second array element that holds the whole string except for the space.

@HiddenS3crets: [url=htp://php.net/trim]trim()[/url] would make more sense.

Re: Regex for URLs issue

Posted: Wed Aug 04, 2010 8:43 pm
by HiddenS3crets
ah yes i see what you mean now man, that worked thank you!

Code: Select all

// old regex:
preg_match_all('/(?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*(?:[[:space:]]|$)/Ui', $str, $matches);

// new regex
preg_match_all('/((?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*)(?:[[:space:]]|$)/Ui', $str, $matches);
Then I just access the urls through $matches[1] instead of $matches[0] :D

Re: Regex for URLs issue

Posted: Wed Aug 04, 2010 8:44 pm
by superdezign
Right.

And I must be tired.. I didn't realize that you were OP. lol