existing php libraries for parsing street addresses?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
yekibud
Forum Newbie
Posts: 4
Joined: Thu Sep 04, 2008 3:20 pm

existing php libraries for parsing street addresses?

Post by yekibud »

I've got descriptive address strings like this:

Code: Select all

 
104 N Main Street, 3rd floor Masonic Bldg
West Water Street next to Bobbys Auto Sales
436 Industry Rd 1/2 M from US 27 South of
111 Bridge St off US 31W @ N end of
Main Street over Blue Daisey Flower Shop
112 S. Main Street               WILLIAMSTOWN
Hwy 55, Springfield Rd & Corporate Drive
US Highway 68 West, behing Subway       CADIZ
 
etc., that I have to parse the street address from. I was thinking of coming up with my own regex, but then I got tired of writing all the possibilities for Rd, Rd., Road, St, St., Street, and so on. Then I thought there must be some generic libraries to help me out here.

I found Geo::StreetAddress::US in CPAN, which I guess I could get to work - but I wanted to check if there was anything natively in PHP to help me out.

Thanks for the tips.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: existing php libraries for parsing street addresses?

Post by GeertDD »

PCRE is included in PHP by default. I think it should be able to do the job.

The first step in solving this problem is to determine where and what street names are on each line. Then build a regex for it.

Some statements that should help us get started (correct them if needed):
  • Each line contains one street name;
  • The street name can optionally be preceded by a number;
  • Every word of a street name begins with a capital.
I'm not familiar with English addresses and so I'm not sure what to do about the "N" in the first line of your examples. Also "Hwy 55, Springfield Rd & Corporate Drive" could be tricky if "Hwy" is not the street name you want to extract.
yekibud
Forum Newbie
Posts: 4
Joined: Thu Sep 04, 2008 3:20 pm

Re: existing php libraries for parsing street addresses?

Post by yekibud »

Thanks for your reply, GeertDD.

What I'm trying to do is avoid writing my own regex. I started to do so, but it felt like wheel re-inventing.

It seems like there should be pre-cooked regex libraries for tasks like this - I'm sure I'm not the only one who has had to pick postal addresses out of strings of text.
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Re: existing php libraries for parsing street addresses?

Post by GeertDD »

You could try http://regexlib.com/
yekibud
Forum Newbie
Posts: 4
Joined: Thu Sep 04, 2008 3:20 pm

Re: existing php libraries for parsing street addresses?

Post by yekibud »

That's a great link! I'll see if I can find anything there.

Thanks.
marcth
Forum Contributor
Posts: 142
Joined: Mon Aug 25, 2008 8:16 am

Re: existing php libraries for parsing street addresses?

Post by marcth »

I'm not sure what country you live in, but it may be worth while visiting your federal postal service website and see what their addressing standards are. Seems to me that if you're going to clean up those addresses, you may as well follow your country's standards.
yekibud
Forum Newbie
Posts: 4
Joined: Thu Sep 04, 2008 3:20 pm

Re: existing php libraries for parsing street addresses?

Post by yekibud »

Thanks for your reply, marcth. I'm in the US.
it may be worth while visiting your federal postal service website and see what their addressing standards are.
Right - that's what I would do if I wanted to write the regex myself. I'm hoping that somebody has already gone through that trouble and I can just implement the solution - like from the Perl package I mentioned previously, or maybe just grabbing a snippit from regexlib.com, as GeertDD suggested.
Post Reply