Page 2 of 2

Re: E-mail address validation

Posted: Sat Aug 14, 2010 4:58 am
by MichaelR
Apparently my regex is now the basis for filter_var(). I guess sometimes it is better to re-invent the wheel.

Re: E-mail address validation

Posted: Mon Oct 18, 2010 5:30 pm
by MichaelR
Okay, for those who are interested, this code now matches for folding white spaces and infinitely nested comments. The entire regular expression is just 777 characters long. Compare with Perl's infamous regex which is over 6,500 characters long and doesn't match for comments or folding white space.

The class also has an option to check for MX RRs at the given domain.

For those who might be aware, there are differences between RFC 5322 and RFC 5321. Taking into account just the address itself, not the mailbox/route, etc., the following code should be run for each:

Code: Select all

// RFC 5322

EmailAddressValidator::SetEmailAddress('michael@example.com', false)->Validate();

// RFC 5321

EmailAddressValidator::SetEmailAddress('michael@example.com', false)->SetCFWS(false)->SetObsolete(false)->Validate();

Re: E-mail address validation

Posted: Mon Nov 15, 2010 2:36 am
by MichaelR
I've updated to the second version now. The code is much better separated in the class and the entire regular expression has been reduced to just 585 characters for isValid5322 and just 383 for isValid5321. Part of this is due to not checking for length limits, as per RFC 5321, which states "To the maximum extent possible, implementation techniques that impose no limits on the length of these objects should be used." Indeed, the length limit is only a SHOULD, not a MUST.

The other differences include not being able to turn off dot-atom or domain-name domains, and nor can one make domain names or dot atoms "strict". Additionally, CFWS is always of the obsolete form, and rather than pass as the optional second parameter "true" when instantiating the object, the two options are "5321" and "5322" which turn on quoted strings and domain literals in the first case and obsolete local-parts, domain literals, and CFWS in the second. Internationalized labels need to be explicitly allowed if required. Finally, the object can be instantiated using the "new" keyword as normal rather than by just using the static "setEmailAddress()" method.

My article, linked to in my signature, also now includes unit tests and comparisons with other popular validators/parsers.