How to compare domain names within two URLs

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
cybercytes
Forum Newbie
Posts: 3
Joined: Tue Mar 04, 2008 10:32 am

How to compare domain names within two URLs

Post by cybercytes »

I'm trying to compare two URLs to see if they come from the same domain name.
1) the urls can be unsecure or secure (http or https)
2) there may or may not be a host designation (www)
3) the host designations may be different (www, ns2, etc)
example input:
http://www.domain.com/somepage.html
https://www2.domain.com/anotherpage.html
results: same domain name

My current snip is quite primitive (only comparing strings):

Code: Select all

function check($url1, $url2)
{
    global $settings;
    if(!stristr($url1, $url2))
        return NOT_SAME_DOMAIN;
    if (url_exists($url1))
    {
        $page = join("", file($url1));
        if(stristr($page, $settings['url_option'])==false)
            return NOT_FOUND. $url1;
    }
    else
        return URL_NOT. $url1;
    return false;
}
Any help greatly appreciated.
User avatar
hawkenterprises
Forum Commoner
Posts: 54
Joined: Thu Feb 28, 2008 9:56 pm
Location: gresham,oregon
Contact:

Re: How to compare domain names within two URLs

Post by hawkenterprises »

I'm not sure if you have found this function yet but it's golden
http://us.php.net/parse_url

It takes a standard URL and turn it into
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)

Then all you have to do is explode host via a period and you should have all the parts you need to do an accurate comparison. If you need actually code let us know but this should be enough that you can figure it out from here.
cybercytes
Forum Newbie
Posts: 3
Joined: Tue Mar 04, 2008 10:32 am

Re: How to compare domain names within two URLs

Post by cybercytes »

hawkenterprises,
Thank for the very useful and timely reply.

This is what I have so far, to breakdown the urls.
It works ok for most urls, but breaks with country tld urls that do not include the url host.

Code: Select all

$url = '...url string...';
 
$parsed_url = parse_url($url, PHP_URL_HOST);
 
$pieces = explode(".", $parsed_url);
 
if(!$pieces[2]){
    $domain = $pieces[0] . "." . $pieces[1];
}elseif(!$pieces[3]){
    $domain = $pieces[1] . "." . $pieces[2];
}else{
    $domain = $pieces[1] . "." . $pieces[2] . "." . $pieces[3];
}
echo $domain;
[/size]

OUTPUTS:

url with host name: https://www2.domain.com/anypage.html
www2.domain.com
OUTPUT: domain.com
(good)
---
url without host name: http://domain.com/somepage.html
domain.com
OUTPUT: domain.com
(good)
---
url with host name, plus country tld: https://www3.domain.com.ca/anotherpage.html
www3.domain.com.ca
OUTPUT: domain.com.ca
(good)
---
url without host name, plus country tld: http://domain.com.ca/stillanotherpage.html
domain.com.ca
OUTPUT: com.ca
(breaks)


Any thoughts on how to clean this up would be greatly appreciated.
User avatar
hawkenterprises
Forum Commoner
Posts: 54
Joined: Thu Feb 28, 2008 9:56 pm
Location: gresham,oregon
Contact:

Re: How to compare domain names within two URLs

Post by hawkenterprises »

How you might address this is by checking the strength of the subdomain and assume anything over X strlen is valid as a domain name. This isn't a great way because there is problems with the solution. Some one else might have a better solution. However this is similar to what I use for my crawlers.
cybercytes
Forum Newbie
Posts: 3
Joined: Tue Mar 04, 2008 10:32 am

Re: How to compare domain names within two URLs

Post by cybercytes »

hawkenterprises wrote:This isn't a great way
I agree, some stronger solution is needed.
Thanks
Post Reply