Page 1 of 1
Parsing email addresses
Posted: Wed Aug 05, 2009 1:06 pm
by William
Does anyone have a script / regular expression to parse the following kinds of emails:
Code: Select all
"John Doe" <me@johndoe.com>
John Doe <me@johndoe.com>
me@johndoe.com
It would also be able to parse multiple emails in a row like the following:
Code: Select all
"John Doe" <me@johndoe.com>, John Doe <me@johndoe.com>, me@johndoe.com, <me@johndoe.com>
Would all work. While emails like...
Code: Select all
"John Doe <me@johndoe.com> or John Doe me@johndoe.com>
wouldn't work at all. I'm hoping since it's part of the RFC 2822 specifications that there is something out there that I'm not finding.
If anyone knows of anything it would be greatly appreciated.
Re: Parsing email addresses
Posted: Thu Aug 06, 2009 7:36 am
by prometheuzz
I don't know of an existing tool/script, but here's a quick and dirty hack:
Code: Select all
^(?:"[^"@<>]+"|[^"@<>]+)?\s*(?:<[^@<>]+@.*?\.[^.\s]{2,}>|[^@<>]+@.*?\.[^.\s]{2,})(?: *, *(?:"[^"@<>]+"|[^"@<>]+)?\s*(?:<[^@<>]+@.*?\.[^.\s]{2,}>|[^@<>]+@.*?\.[^.\s]{2,}))*$
...
Yeah, ouch!
Of course, if you break that voodoo up, it looks a bit more maintainable:
Code: Select all
<?php
$tests = array(
'"John Doe" <me@johndoe.com>',
'John Doe <me@johndoe.com>',
'me@johndoe.com',
'"John Doe" <me@johndoe.com>, John Doe <me@johndoe.com>, me@johndoe.com, <me@johndoe.com>',
'"John Doe <me@johndoe.com> or John Doe me@johndoe.com>'
);
$name = '[^"@<>]+';
$email = '[^@<>]+@.*?\.[^.\s]{2,}';
$name_email = "(?:\"$name\"|$name)?\s*(?:<$email>|$email)";
$regex = "/^$name_email( *, *$name_email)*$/";
foreach($tests as $t) {
if(preg_match($regex, $t)) {
echo "OK : $t\n";
} else {
echo "WRONG : $t\n";
}
}
/* output:
OK : "John Doe" <me@johndoe.com>
OK : John Doe <me@johndoe.com>
OK : me@johndoe.com
OK : "John Doe" <me@johndoe.com>, John Doe <me@johndoe.com>, me@johndoe.com, <me@johndoe.com>
WRONG : "John Doe <me@johndoe.com> or John Doe me@johndoe.com>
*/
?>
Re: Parsing email addresses
Posted: Thu Aug 06, 2009 7:38 am
by William
You're amazing, there was a few things I couldn't figure out in regular expressions myself. This was a world of help, thanks a lot.
Re: Parsing email addresses
Posted: Thu Aug 06, 2009 7:58 am
by prometheuzz
You're welcome William. I assume you understand the regex I posted (the simplified one)? If not, feel free to ask (after studying it yourself of course), then I can elaborate a bit. Wouldn't want you to use something you don't fully understand.
Re: Parsing email addresses
Posted: Thu Aug 06, 2009 9:12 am
by William
prometheuzz wrote:You're welcome William. I assume you understand the regex I posted (the simplified one)? If not, feel free to ask (after studying it yourself of course), then I can elaborate a bit. Wouldn't want you to use something you don't fully understand.
Yeah I understand it, I was overly complicating my own. I was trying to figure out how I'd make it so if it had lets say a quote at the start it would then have to have a quote at the end. I didn't think about doing a simple OR and doing it that way.
Now I'm just playing around with it so I can parse a string containing these emails (user inputed) and output an array containing the email and name (if any). If any email in the regex failed, it would fail. Maybe it should be two regular expressions? One that validates it, and one that parses the data.
Either way what you've done is helping a lot, thanks again.

Re: Parsing email addresses
Posted: Thu Aug 06, 2009 9:24 am
by prometheuzz
William wrote:...
Maybe it should be two regular expressions? One that validates it, and one that parses the data.
...
Yes, that's what I'd do as well.
Re: Parsing email addresses
Posted: Thu Aug 06, 2009 9:26 am
by superdezign
E-mail addresses can be a lot more complex than that, though. There are a lot of RFC standards on how e-mail addresses can look and what characters they can and cannot have. If you are concerned with not accidentally declaring invalid e-mail addresses as verified or valid e-mail addresses as unverified, it might be a good idea to take a look at them.
Re: Parsing email addresses
Posted: Thu Aug 06, 2009 9:40 am
by William
superdezign wrote:E-mail addresses can be a lot more complex than that, though. There are a lot of RFC standards on how e-mail addresses can look and what characters they can and cannot have. If you are concerned with not accidentally declaring invalid e-mail addresses as verified or valid e-mail addresses as unverified, it might be a good idea to take a look at them.
I don't plan on using the email regular expression posted above, the only thing I cared about was the ability to have full name, etc onto the email also. I do agree though, it's annoying how many possibilities there are in an email.