Page 1 of 1

How to compare domain names within two URLs

Posted: Tue Mar 04, 2008 10:50 am
by cybercytes
I'm trying to compare two URLs to see if they come from the same domain name.
1) the urls can be unsecure or secure (http or https)
2) there may or may not be a host designation (www)
3) the host designations may be different (www, ns2, etc)
example input:
http://www.domain.com/somepage.html
https://www2.domain.com/anotherpage.html
results: same domain name

My current snip is quite primitive (only comparing strings):

Code: Select all

function check($url1, $url2)
{
    global $settings;
    if(!stristr($url1, $url2))
        return NOT_SAME_DOMAIN;
    if (url_exists($url1))
    {
        $page = join("", file($url1));
        if(stristr($page, $settings['url_option'])==false)
            return NOT_FOUND. $url1;
    }
    else
        return URL_NOT. $url1;
    return false;
}
Any help greatly appreciated.

Re: How to compare domain names within two URLs

Posted: Tue Mar 04, 2008 11:07 am
by hawkenterprises
I'm not sure if you have found this function yet but it's golden
http://us.php.net/parse_url

It takes a standard URL and turn it into
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)

Then all you have to do is explode host via a period and you should have all the parts you need to do an accurate comparison. If you need actually code let us know but this should be enough that you can figure it out from here.

Re: How to compare domain names within two URLs

Posted: Tue Mar 04, 2008 2:45 pm
by cybercytes
hawkenterprises,
Thank for the very useful and timely reply.

This is what I have so far, to breakdown the urls.
It works ok for most urls, but breaks with country tld urls that do not include the url host.

Code: Select all

$url = '...url string...';
 
$parsed_url = parse_url($url, PHP_URL_HOST);
 
$pieces = explode(".", $parsed_url);
 
if(!$pieces[2]){
    $domain = $pieces[0] . "." . $pieces[1];
}elseif(!$pieces[3]){
    $domain = $pieces[1] . "." . $pieces[2];
}else{
    $domain = $pieces[1] . "." . $pieces[2] . "." . $pieces[3];
}
echo $domain;
[/size]

OUTPUTS:

url with host name: https://www2.domain.com/anypage.html
www2.domain.com
OUTPUT: domain.com
(good)
---
url without host name: http://domain.com/somepage.html
domain.com
OUTPUT: domain.com
(good)
---
url with host name, plus country tld: https://www3.domain.com.ca/anotherpage.html
www3.domain.com.ca
OUTPUT: domain.com.ca
(good)
---
url without host name, plus country tld: http://domain.com.ca/stillanotherpage.html
domain.com.ca
OUTPUT: com.ca
(breaks)


Any thoughts on how to clean this up would be greatly appreciated.

Re: How to compare domain names within two URLs

Posted: Thu Mar 06, 2008 7:37 pm
by hawkenterprises
How you might address this is by checking the strength of the subdomain and assume anything over X strlen is valid as a domain name. This isn't a great way because there is problems with the solution. Some one else might have a better solution. However this is similar to what I use for my crawlers.

Re: How to compare domain names within two URLs

Posted: Thu Mar 06, 2008 7:49 pm
by cybercytes
hawkenterprises wrote:This isn't a great way
I agree, some stronger solution is needed.
Thanks