Parsing email addresses

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
William
Forum Contributor
Posts: 332
Joined: Sat Oct 25, 2003 4:03 am
Location: New York City

Parsing email addresses

Post by William »

Does anyone have a script / regular expression to parse the following kinds of emails:

Code: Select all

 
"John Doe" <me@johndoe.com>
John Doe <me@johndoe.com>
me@johndoe.com
 
It would also be able to parse multiple emails in a row like the following:

Code: Select all

 
"John Doe" <me@johndoe.com>, John Doe <me@johndoe.com>, me@johndoe.com, <me@johndoe.com>
 
Would all work. While emails like...

Code: Select all

 
"John Doe <me@johndoe.com> or John Doe me@johndoe.com>
 
wouldn't work at all. I'm hoping since it's part of the RFC 2822 specifications that there is something out there that I'm not finding.

If anyone knows of anything it would be greatly appreciated.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Parsing email addresses

Post by prometheuzz »

I don't know of an existing tool/script, but here's a quick and dirty hack:

Code: Select all

^(?:"[^"@<>]+"|[^"@<>]+)?\s*(?:<[^@<>]+@.*?\.[^.\s]{2,}>|[^@<>]+@.*?\.[^.\s]{2,})(?: *, *(?:"[^"@<>]+"|[^"@<>]+)?\s*(?:<[^@<>]+@.*?\.[^.\s]{2,}>|[^@<>]+@.*?\.[^.\s]{2,}))*$
...

Yeah, ouch!

Of course, if you break that voodoo up, it looks a bit more maintainable:

Code: Select all

<?php
$tests = array(
  '"John Doe" <me@johndoe.com>',
  'John Doe <me@johndoe.com>',
  'me@johndoe.com',
  '"John Doe" <me@johndoe.com>, John Doe <me@johndoe.com>, me@johndoe.com, <me@johndoe.com>',
  '"John Doe <me@johndoe.com> or John Doe me@johndoe.com>'
);
 
$name = '[^"@<>]+';
$email = '[^@<>]+@.*?\.[^.\s]{2,}';
$name_email = "(?:\"$name\"|$name)?\s*(?:<$email>|$email)";
$regex = "/^$name_email( *, *$name_email)*$/";
 
foreach($tests as $t) {
  if(preg_match($regex, $t)) {
    echo "OK    : $t\n";
  } else {
    echo "WRONG : $t\n";
  }
}
/* output:
OK    : "John Doe" <me@johndoe.com>
OK    : John Doe <me@johndoe.com>
OK    : me@johndoe.com
OK    : "John Doe" <me@johndoe.com>, John Doe <me@johndoe.com>, me@johndoe.com, <me@johndoe.com>
WRONG : "John Doe <me@johndoe.com> or John Doe me@johndoe.com>
*/
?>
Last edited by prometheuzz on Thu Aug 06, 2009 7:55 am, edited 1 time in total.
User avatar
William
Forum Contributor
Posts: 332
Joined: Sat Oct 25, 2003 4:03 am
Location: New York City

Re: Parsing email addresses

Post by William »

You're amazing, there was a few things I couldn't figure out in regular expressions myself. This was a world of help, thanks a lot.
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Parsing email addresses

Post by prometheuzz »

You're welcome William. I assume you understand the regex I posted (the simplified one)? If not, feel free to ask (after studying it yourself of course), then I can elaborate a bit. Wouldn't want you to use something you don't fully understand.
User avatar
William
Forum Contributor
Posts: 332
Joined: Sat Oct 25, 2003 4:03 am
Location: New York City

Re: Parsing email addresses

Post by William »

prometheuzz wrote:You're welcome William. I assume you understand the regex I posted (the simplified one)? If not, feel free to ask (after studying it yourself of course), then I can elaborate a bit. Wouldn't want you to use something you don't fully understand.
Yeah I understand it, I was overly complicating my own. I was trying to figure out how I'd make it so if it had lets say a quote at the start it would then have to have a quote at the end. I didn't think about doing a simple OR and doing it that way.

Now I'm just playing around with it so I can parse a string containing these emails (user inputed) and output an array containing the email and name (if any). If any email in the regex failed, it would fail. Maybe it should be two regular expressions? One that validates it, and one that parses the data.

Either way what you've done is helping a lot, thanks again. :)
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: Parsing email addresses

Post by prometheuzz »

William wrote:...
Maybe it should be two regular expressions? One that validates it, and one that parses the data.
...
Yes, that's what I'd do as well.
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Re: Parsing email addresses

Post by superdezign »

E-mail addresses can be a lot more complex than that, though. There are a lot of RFC standards on how e-mail addresses can look and what characters they can and cannot have. If you are concerned with not accidentally declaring invalid e-mail addresses as verified or valid e-mail addresses as unverified, it might be a good idea to take a look at them.
User avatar
William
Forum Contributor
Posts: 332
Joined: Sat Oct 25, 2003 4:03 am
Location: New York City

Re: Parsing email addresses

Post by William »

superdezign wrote:E-mail addresses can be a lot more complex than that, though. There are a lot of RFC standards on how e-mail addresses can look and what characters they can and cannot have. If you are concerned with not accidentally declaring invalid e-mail addresses as verified or valid e-mail addresses as unverified, it might be a good idea to take a look at them.
I don't plan on using the email regular expression posted above, the only thing I cared about was the ability to have full name, etc onto the email also. I do agree though, it's annoying how many possibilities there are in an email.
Post Reply