Regex for URLs issue

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
HiddenS3crets
Forum Contributor
Posts: 119
Joined: Fri Apr 22, 2005 12:23 pm
Location: USA

Regex for URLs issue

Post by HiddenS3crets »

I'm trying to write a regular expression to strip any URLs from a bigger string

Example:
Given the string "check out google.com and also yahoo.com"

The regex will return google.com and yahoo.com

My current regex is

Code: Select all

<?php
$str = "check out google.com and also yahoo.com";

preg_match_all('/(?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*(?:[[:space:]]|$)/Ui', $str, $matches);
?>
The regex works fine, but when it matches a URL in the string, it stores it in $matches with the space at the end (e.g. 'google.com ')

Is there a way to tell the regex to not save the space in $matches?
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Re: Regex for URLs issue

Post by superdezign »

You could try making one large group of everything but the characters that you want and extract that from the array rather than the entire match.
HiddenS3crets
Forum Contributor
Posts: 119
Joined: Fri Apr 22, 2005 12:23 pm
Location: USA

Re: Regex for URLs issue

Post by HiddenS3crets »

I'm not really following...

I know I could get rid of the space by doing something like this:

Code: Select all

<?php
foreach($matches[0] as $url)
{
  if(substr($url, -1) == ' ') $url = substr($url, 0, strlen($url) - 1);
}
?>
But it seems like overkill if there's actually a way to just tell [[:space:]] to be saved as part of each match
Last edited by HiddenS3crets on Wed Aug 04, 2010 8:41 pm, edited 1 time in total.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Re: Regex for URLs issue

Post by superdezign »

preg_match_all saves all matched pieces. I sounds like the OP is trying to use the first match of the array, which is the full string. Wha they want to do is create a second array element that holds the whole string except for the space.

@HiddenS3crets: [url=htp://php.net/trim]trim()[/url] would make more sense.
HiddenS3crets
Forum Contributor
Posts: 119
Joined: Fri Apr 22, 2005 12:23 pm
Location: USA

Re: Regex for URLs issue

Post by HiddenS3crets »

ah yes i see what you mean now man, that worked thank you!

Code: Select all

// old regex:
preg_match_all('/(?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*(?:[[:space:]]|$)/Ui', $str, $matches);

// new regex
preg_match_all('/((?:http:\/\/)?(?:www\.)?[a-z0-9-_]+\.[a-zA-Z]{2,4}\/?.*)(?:[[:space:]]|$)/Ui', $str, $matches);
Then I just access the urls through $matches[1] instead of $matches[0] :D
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Re: Regex for URLs issue

Post by superdezign »

Right.

And I must be tired.. I didn't realize that you were OP. lol
Post Reply