I think I need help with substr_replace... or an alternative

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
erika
Forum Newbie
Posts: 17
Joined: Sat Oct 25, 2008 5:27 pm

I think I need help with substr_replace... or an alternative

Post by erika »

I think I either need help understanding exactly how substr_replace works or--more likely--advice on how to accomplish what I'm doing more efficiently.

I am parsing address lines that look like this:

Address: John Doe, 123 Main St, Apt 3 TownName, ST
Address: John Doe, 123 Main St North TownName, ST

And putting them into a tab-delimited file (ST is the state abbreviation).

There is sometimes--but rarely--a comma delimiting the street address from the town name, usually a comma between the town name and the state, and there is sometimes a comma within the address. This makes using the commas for parsing purposes not much use (I believe).

Nearly all the addresses I am parsing come from one of two states, so keeping a list of city names is feasible, though that won't help for those occasional addresses in other states.

Here's what I'm doing (probably more complex than necessary but it works for single name towns. Suggestions for improvements are welcome):

Code: Select all

function get_address($this_line) {

        // Remove the beginning of the line.
        $address = str_replace("Address: ","",$this_line);

        // trim it in case there is extra white space
        $address = trim($address);

        // Get the state, which should be the last two characters
        $state = substr($address,-2);

        // Now remove the state and the preceding comma and space.
        // Do this with substr instead of str_replace to avoid
        // inadvertently removing anything else that might match.

        $address_length = strlen($address);

        $chop_to = $address_length - 4;

        $address = substr($address,0,$chop_to);

        // Now get the city.  This is retrieves only the last word in the address,
        // so it doesn't work for two (or more) word city names unless they
        // are fully hyphenated (like Manchester-by-the-Sea).

        // Get the position of the last space
        $last_space_pos = strrpos($address," ");

        // How long is the address now?
        $address_length = strlen($address);

        // Get the characters after the last space in the address.

        // How many characters should be grabbed?
        $grab = $address_length - $last_space_pos;

        $city = substr($address,-$grab);
        $city = strtoupper(trim($city));

        // What's left is the address.
        $address = strtolower(substr($address,0,$last_space_pos));

        // Remove the commas
        $address = str_replace(",","",$address);

        // Find the zip code (this is temporary until a better method can 
        // be implemented--maybe a database, since I can't use the USPS API)

        if ($state == "NH") {

                switch ($city) {

               // Note to readers: At this point there is a VERY long list of cases 
               // for various cities to generate the zip code. I include only one
               // here for your comfort in reading.

                        case "ALEXANDRIA":
                                $zip = "03222";
                                break;
                        default:
                                $zip = NULL;
                                break;
                }

        }

        else if ($state == "MA") {

                switch($city) {

               // Same here -- long set of cases

                        case "ACTON":
                                $zip = "01720";
                                break;
                        default:
                                $zip = NULL;
                                break;
                }
        }

        else {
                // If the state isn't NH or MA, just set the zip to NULL to
                // look up manually after parsing for now.
                $zip = NULL;
        }

        $full_address['state']  = $state;
        $full_address['street'] = ucwords($address);
        $full_address['city']   = ucwords(strtolower($city));
        $full_address['zip']    = $zip;
        return $full_address;
}
Then I take the $full_address array and drop its elements into the tab-delimited file next to the name and other data.

This works beautifully--until I come upon a town name that is more than one word--and there are a substantial enough number of them that it's a PITA to process them manually.

What happens is I end up with something like this (I added spaces around the tab characters for clarity, there are no spaces around them in the actual output):

John \t Doe \t 123 Main St North \t TownName \t ST \t 12345 (instead of ...\t North TownName \t ST...)
or
John \t Doe \t 123 Main St North \t TownName \ST \t

The second instances, without the zip code, occurs when there is a North TownName but no TownName--this is actually more convenient because at least the lack of zip code makes the record jump out--I can search for all addresses without a zip code or just dump the whole thing into a spreadsheet and sort. The first instance, however, is more problematic because I end up with the wrong zip code. (For the purposes of this database I am assuming one zip code per town--I do not have the funding to subscribe to a zip code database and the USPS says our first-class mailing need does not qualify for their API, but they manage to deliver the mail to the recipient, as long as I manage to get a zip code for the right town).

I can't just check to see if something comes after the street name, because the ending of the street address varies (apartment names, floors, etc.) and sometimes a street name ends in "North" or some other word--like Lake Shore Drive North, Westford, MA. I know there is no North Westford, so this is definitely a case where the North belongs to the street address. It's less clear when verifying addresses in a town like North Hampton.

So I came up with the idea of checking the addresses in just the two-name towns, and added the following case to my zip code switch:

Code: Select all

                        case "BOSTON":
                          // Check to see if maybe it meant New Boston -- Otherwise
                          // It might have just been a typo!

                          // The whole address is still in lowercase here, so check for
                          // lowercase only.
                          if (substr(" new",-4,4)) {

                                $city = "New Boston";
                                $zip = "03110";

                                $address = str_replace(" new","",$address);
                          }
                                break;
There is no Boston, NH, so if I come across an address that says Boston in NH, I know that either it must be New Boston or someone typed NH instead of MA (more common than you might think).

But then I realized that my str_replace would replace all instances of "new" in the address... so if someone lived at, for example, 123 Newton Rd, New Boston, I'd end up with an address of 123 Ton Rd, New Boston, 03110... which would either get delivered to the wrong person (assuming there is a Ton Rd) or just get sent back to sender.

So I looked up substr_replace but I can't, for the life of me, figure out how to use it... I can't figure out which data goes where in order to accomplish what I'm trying to do.

Any tips and information regarding how to do this--or alternative solutions for parsing that might work better or be more efficient--would be appreciated.

Thank you!
xtiano77
Forum Commoner
Posts: 72
Joined: Tue Sep 22, 2009 10:53 am
Location: Texas

Re: I think I need help with substr_replace... or an alterna

Post by xtiano77 »

Check out this link as far as instructions on how to use the “str_replace()” function/method: http://www.w3schools.com/php/func_strin ... eplace.asp

As far as the “Newton” vs. “New Boston” issue, have you considered using regular expressions with the “preg_replace()” function/method? http://us.php.net/manual/en/function.preg-replace.php

I think between the two you should be able to do what you need.

Possible regex: “\new[a-zA-Z]+\”
Post Reply