Page 1 of 1
Regex for URLs issue
Posted: Wed Aug 04, 2010 8:30 pm
by HiddenS3crets
I'm trying to write a regular expression to strip any URLs from a bigger string
Example:
Given the string "check out google.com and also yahoo.com"
The regex will return google.com and yahoo.com
My current regex is
Code: Select all
<?php
$str = "check out google.com and also yahoo.com";
preg_match_all('/(?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*(?:[[:space:]]|$)/Ui', $str, $matches);
?>
The regex works fine, but when it matches a URL in the string, it stores it in $matches with the space at the end (e.g. 'google.com ')
Is there a way to tell the regex to not save the space in $matches?
Re: Regex for URLs issue
Posted: Wed Aug 04, 2010 8:34 pm
by superdezign
You could try making one large group of everything but the characters that you want and extract that from the array rather than the entire match.
Re: Regex for URLs issue
Posted: Wed Aug 04, 2010 8:38 pm
by HiddenS3crets
I'm not really following...
I know I could get rid of the space by doing something like this:
Code: Select all
<?php
foreach($matches[0] as $url)
{
if(substr($url, -1) == ' ') $url = substr($url, 0, strlen($url) - 1);
}
?>
But it seems like overkill if there's actually a way to just tell [[:space:]] to be saved as part of each match
Re: Regex for URLs issue
Posted: Wed Aug 04, 2010 8:40 pm
by superdezign
preg_match_all saves all matched pieces. I sounds like the OP is trying to use the first match of the array, which is the full string. Wha they want to do is create a second array element that holds the whole string except for the space.
@HiddenS3crets: [url=htp://php.net/trim]trim()[/url] would make more sense.
Re: Regex for URLs issue
Posted: Wed Aug 04, 2010 8:43 pm
by HiddenS3crets
ah yes i see what you mean now man, that worked thank you!
Code: Select all
// old regex:
preg_match_all('/(?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*(?:[[:space:]]|$)/Ui', $str, $matches);
// new regex
preg_match_all('/((?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*)(?:[[:space:]]|$)/Ui', $str, $matches);
Then I just access the urls through $matches[1] instead of $matches[0]

Re: Regex for URLs issue
Posted: Wed Aug 04, 2010 8:44 pm
by superdezign
Right.
And I must be tired.. I didn't realize that you were OP. lol