E-mail address validation

Coding Critique is the place to post source code for peer review by other members of DevNetwork. Any kind of code can be posted. Code posted does not have to be limited to PHP. All members are invited to contribute constructive criticism with the goal of improving the code. Posted code should include some background information about it and what areas you specifically would like help with.

Popular code excerpts may be moved to "Code Snippets" by the moderators.

Moderator: General Moderators

MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: E-mail address validation

Post by MichaelR »

Apparently my regex is now the basis for filter_var(). I guess sometimes it is better to re-invent the wheel.
Last edited by MichaelR on Tue Oct 19, 2010 5:38 pm, edited 2 times in total.
MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: E-mail address validation

Post by MichaelR »

Okay, for those who are interested, this code now matches for folding white spaces and infinitely nested comments. The entire regular expression is just 777 characters long. Compare with Perl's infamous regex which is over 6,500 characters long and doesn't match for comments or folding white space.

The class also has an option to check for MX RRs at the given domain.

For those who might be aware, there are differences between RFC 5322 and RFC 5321. Taking into account just the address itself, not the mailbox/route, etc., the following code should be run for each:

Code: Select all

// RFC 5322

EmailAddressValidator::SetEmailAddress('michael@example.com', false)->Validate();

// RFC 5321

EmailAddressValidator::SetEmailAddress('michael@example.com', false)->SetCFWS(false)->SetObsolete(false)->Validate();
MichaelR
Forum Contributor
Posts: 148
Joined: Sat Jan 03, 2009 3:27 pm

Re: E-mail address validation

Post by MichaelR »

I've updated to the second version now. The code is much better separated in the class and the entire regular expression has been reduced to just 585 characters for isValid5322 and just 383 for isValid5321. Part of this is due to not checking for length limits, as per RFC 5321, which states "To the maximum extent possible, implementation techniques that impose no limits on the length of these objects should be used." Indeed, the length limit is only a SHOULD, not a MUST.

The other differences include not being able to turn off dot-atom or domain-name domains, and nor can one make domain names or dot atoms "strict". Additionally, CFWS is always of the obsolete form, and rather than pass as the optional second parameter "true" when instantiating the object, the two options are "5321" and "5322" which turn on quoted strings and domain literals in the first case and obsolete local-parts, domain literals, and CFWS in the second. Internationalized labels need to be explicitly allowed if required. Finally, the object can be instantiated using the "new" keyword as normal rather than by just using the static "setEmailAddress()" method.

My article, linked to in my signature, also now includes unit tests and comparisons with other popular validators/parsers.
Post Reply