Splitting a domain into sections.

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Splitting a domain into sections.

Post by onion2k »

Help. I've gone all stupid.

I need to break a domain into parts like TLD, domain, and subdomain. This should be trivial. At the moment I have:

Code: Select all

    $extensions = array("co","com","org","net","eu","info","tv");

    if ($_SERVER['HTTP_HOST'] != "localhost") {

        $arrDomain = explode(".",$_SERVER['HTTP_HOST']);

        $n = 0;
        for ($x = count($arrDomain); $x>0; $x--) {
            if (in_array($arrDomain[$x],$extensions)) {
                break(1);
            }
            $n++;
        }

        $tld = implode(".",array_slice($arrDomain,count($arrDomain)-$n));
        $domain = $arrDomain[count($arrDomain)-1-$n].".".$tld;
        $subdomain = implode(".",array_slice($arrDomain,0,count($arrDomain)-1-$n));

    } else {

        $domain = "localhost";

    }
That works well enough for now but it'll break as soon as a new extension appears. Is there a better way that wouldn't rely on an array? I can't see a way to do it without knowing what extensions there are.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

Not sure I understand fully (just woke up). Have you looked into: parse_url()?

Why would your code break if you add an extension? Or do you mean by outside forces, like if someone uses a free TK domain? Your tokenizer/parser wouldn't handle it?

I would look into using parse_url first then further refining those results or implementing a custom URL thokenizer - or googling for one. :)
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Post by VladSun »

There are 10 types of people in this world, those who understand binary and those who don't
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

Hockey wrote:I would look into using parse_url first then further refining those results or implementing a custom URL thokenizer - or googling for one. :)
Parse_url() can only extract the hostname as a block ... I need to break it down further in TLD, domain, and subdomain blocks.
That's nice, and better than what I've written, but it still suffers from breaking if a domain has a TLD that's not in the array.

EDIT: Hmm.. it doesn't return .co.us domains properly.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Post by VladSun »

If you follow the IANA standard for domain names you could do it yourself.
Domain could be this:

domain.toplevel
domain.tld

or this

domain.secondlevel.toplevel
domain.secondlevel.tld

toplevel list is small and it is changed rarely so you could make a look up for it.

toplevel or tld should always exist in your URL. It must be 1 to 4 symbols long. If such part /\.[a-zA-Z]{1,4}$/ doesn't exist then the URL is invalid (or it is localhost :) ).

secondlevel is optional. If there is a part at this place of your URL which is 1 or 2 symbols long or it is in the toplevel list than consider this part being "secondlevel"

All of the rest should be considered domain (with its subdomains).

PS: That's why I hate big companies with big money - hp.com!
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply