Page 1 of 2

Get domain without www

Posted: Thu Mar 22, 2007 4:13 pm
by jabbaonthedais
I'm trying to strip a domain from a long url, not including the www.

So for http://www.whatever.com it would result "whatever.com".
But it also needs to work for other domain extensions, such as .co.uk, etc.

So http://whatever.co.uk would result "whatever.co.uk"


I came up with this so far:

Code: Select all

$domain = parse_url($referer);
// take out the www dot
$trimmed = trim($domain[host], "www.");
echo $trimmed;
But, if my first letter in the domain is a W, it erases it also. Any ideas?

Posted: Thu Mar 22, 2007 4:23 pm
by feyd
A regular expression or strpos() could be of use.

Posted: Thu Mar 22, 2007 5:15 pm
by Kieran Huggins

Code: Select all

#http://(?:www\.)?(.*)#
?

Posted: Thu Mar 22, 2007 9:43 pm
by jabbaonthedais
edit: Ok, this seems to be working fine:

Code: Select all

$domain = parse_url($referer);
// take out the www dot
$string = ereg_replace('www.', '', $domain[host]);
echo $string;
Do you see any negative results down the road with that? I put in quite a few urls and all seem to work.

Posted: Thu Mar 22, 2007 10:19 pm
by Kieran Huggins
PCRE is faster than the ereg functions, and you can be a whole lot safer!

Code: Select all

$domain = preg_replace('#^(?:https?://)?(?:www\.)?(.*?)(?:/.*)?$','$1',$referer);

Posted: Thu Mar 22, 2007 10:39 pm
by feyd
What's wrong with strpos()? it's even faster still.

Posted: Fri Mar 23, 2007 7:39 am
by aaronhall

Code: Select all

if(stripos($domain['host'], 'www.')) {
    $referer = str_ireplace('www.', '', $referer, 1);
}

Posted: Fri Mar 23, 2007 10:30 am
by jabbaonthedais
Ok, this is what I've got now:

Code: Select all

$domain = parse_url($referer);
$newurl = $domain['host'];

// take out the www dot
$found = stripos($newurl, 'www.');
if ($found !== false) {
 $newurl = str_ireplace('www.', '', $newurl);
}
aaronhall, I couldn't ever get that if statment to go off. No clue why. I took the ", 1" out of the end, made the host variable a real string, and still no luck.

Posted: Fri Mar 23, 2007 10:43 am
by Kieran Huggins
does my code not work? It might be marginally slower, but it's safer...

Posted: Fri Mar 23, 2007 11:20 am
by stereofrog
jabbaonthedais wrote:Ok, this is what I've got now:

Code: Select all

$domain = parse_url($referer);
$newurl = $domain['host'];

// take out the www dot
$found = stripos($newurl, 'www.');
if ($found !== false) {
 $newurl = str_ireplace('www.', '', $newurl);
}
There's no reason to use str_replace when position of the subject is exactly known. Just strip first 4 symbols off, that's all:

Code: Select all

$host = "www.xyz.com";

if(stripos($host, "www.") === 0) // note three =
	$host = substr($host, 4);

echo $host;

Posted: Fri Mar 23, 2007 11:27 am
by RobertGonzalez
That doesn't account for http://www.something.com. Why san't you just do a straight str_replace on 'www.'. If it is there, it is removed. If it is not there, it is not removed because it can't be.

Posted: Fri Mar 23, 2007 11:33 am
by stereofrog
Everah wrote:That doesn't account for http://www.something.com.
My code is for hostnames only. For parsing full urls, parse_url() should be used first, as OP showed.
Why san't you just do a straight str_replace on 'www.'. If it is there, it is removed. If it is not there, it is not removed because it can't be.
this won't work for e.g. "mywww.com"

Posted: Fri Mar 23, 2007 11:36 am
by RobertGonzalez
I think I like kieran's the best.

Posted: Fri Mar 23, 2007 11:39 am
by jabbaonthedais
Kieran Huggins wrote:does my code not work? It might be marginally slower, but it's safer...
It's not working for me when I use it exactly as you put it... Where can I find a reference on those characters you use in searching? Like below:

Code: Select all

#^(?:https?:)?(?:www\.)?(.*?)(?:/.*)?$'

Posted: Fri Mar 23, 2007 11:42 am
by RobertGonzalez
http://www.devguru.com/Technologies/ecm ... cters.html

It is primarily for javascript, but actually works pretty well as an explanation of regular expressions. I think d11wtq also wrote a tutorial on regex oin the regular expressions forum.