Page 1 of 1

Splitting a domain into sections.

Posted: Tue Dec 04, 2007 9:57 am
by onion2k
Help. I've gone all stupid.

I need to break a domain into parts like TLD, domain, and subdomain. This should be trivial. At the moment I have:

Code: Select all

    $extensions = array("co","com","org","net","eu","info","tv");

    if ($_SERVER['HTTP_HOST'] != "localhost") {

        $arrDomain = explode(".",$_SERVER['HTTP_HOST']);

        $n = 0;
        for ($x = count($arrDomain); $x>0; $x--) {
            if (in_array($arrDomain[$x],$extensions)) {
                break(1);
            }
            $n++;
        }

        $tld = implode(".",array_slice($arrDomain,count($arrDomain)-$n));
        $domain = $arrDomain[count($arrDomain)-1-$n].".".$tld;
        $subdomain = implode(".",array_slice($arrDomain,0,count($arrDomain)-1-$n));

    } else {

        $domain = "localhost";

    }
That works well enough for now but it'll break as soon as a new extension appears. Is there a better way that wouldn't rely on an array? I can't see a way to do it without knowing what extensions there are.

Posted: Tue Dec 04, 2007 12:13 pm
by alex.barylski
Not sure I understand fully (just woke up). Have you looked into: parse_url()?

Why would your code break if you add an extension? Or do you mean by outside forces, like if someone uses a free TK domain? Your tokenizer/parser wouldn't handle it?

I would look into using parse_url first then further refining those results or implementing a custom URL thokenizer - or googling for one. :)

Posted: Tue Dec 04, 2007 12:48 pm
by VladSun

Posted: Wed Dec 05, 2007 6:12 am
by onion2k
Hockey wrote:I would look into using parse_url first then further refining those results or implementing a custom URL thokenizer - or googling for one. :)
Parse_url() can only extract the hostname as a block ... I need to break it down further in TLD, domain, and subdomain blocks.
That's nice, and better than what I've written, but it still suffers from breaking if a domain has a TLD that's not in the array.

EDIT: Hmm.. it doesn't return .co.us domains properly.

Posted: Wed Dec 05, 2007 6:46 am
by VladSun
If you follow the IANA standard for domain names you could do it yourself.
Domain could be this:

domain.toplevel
domain.tld

or this

domain.secondlevel.toplevel
domain.secondlevel.tld

toplevel list is small and it is changed rarely so you could make a look up for it.

toplevel or tld should always exist in your URL. It must be 1 to 4 symbols long. If such part /\.[a-zA-Z]{1,4}$/ doesn't exist then the URL is invalid (or it is localhost :) ).

secondlevel is optional. If there is a part at this place of your URL which is 1 or 2 symbols long or it is in the toplevel list than consider this part being "secondlevel"

All of the rest should be considered domain (with its subdomains).

PS: That's why I hate big companies with big money - hp.com!