PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Mon Oct 22, 2018 8:29 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Wed Nov 18, 2009 11:52 am 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
Syntax: [ Download ] [ Hide ]

// RFC 5322

  function isValid5322($emailAddress)
    {

      return preg_match('/^((?>(?>(?>((?>[       ]+(?>\x0D\x0A[  ]+)*)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)@(?1)(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>(?1)\.(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f0-9]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?9)){3}))\])(?1)$/isD', $emailAddress);

  }

// RFC 5321

  function isValid5321($emailAddress)
    {

      return preg_match('/^(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")@(?>([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}))\])$/iD', $emailAddress);

  }
 



At 585 characters long (for validation according to the more expansive RFC 5322), it is quite complicated. However, it is much simpler than other regexes which try to validate every RFC-compliant address (and using out-dated RFCs as well), which range in the high thousands. But, still, it may be too complex for some. Which is why I've written an article on Email Address Validation which explains step-by-step the construction of the regular expression -- what part validates what -- and then provided a class which allows for easy manipulation of the regex (you can toggle on and off quoted-string, domain literals, internationalized labels, CFWS etc.). It can then be used both by those who want a simple validator and by those who want to accept every RFC 5321/5322 compliant email address.

And here it is as a class:

Syntax: [ Download ] [ Hide ]
  /**
   * Squiloople Framework
   *
   * LICENSE: Feel free to use and redistribute this code.
   *
   * @author Michael Rushton <michael@squiloople.com>
   * @link http://squiloople.com/
   * @category Squiloople
   * @package Models
   * @subpackage Validators
   * @version 1.0
   * @copyright Copyright © 2009-2010 Michael Rushton
   */


  namespace Models\Validators;

  /**
   * Email Address Validator
   *
   * Validate email addresses according to RFC 5321 or RFC 5322
   */

  final class EmailAddressValidator
  {

    /**
     * The email address to validate
     *
     * @access private
     * @var string $_emailAddress
     */

    private $_emailAddress;

    /**
     * A quoted string local part is either allowed (true) or not (false)
     *
     * @access private
     * @var bool $_quotedString
     */

    private $_quotedString = false;

    /**
     * An obsolete local part is either allowed (true) or not (false)
     *
     * @access private
     * @var bool $_obsolete
     */

    private $_obsolete = false;

    /**
     * A domain literal domain is either allowed (true) or not (false)
     *
     * @access private
     * @var bool $_domainLiteral
     */

    private $_domainLiteral = false;

   /**
     * Comments and folding white spaces are either allowed (true) or not (false)
     *
     * @access private
     * @var bool $_cfws
     */

    private $_cfws = false;

    /**
     * Set the email address and turn on the relevant standard if required
     *
     * @access public
     * @param string $emailAddress
     * @param integer $standard
     */

    public function __construct($emailAddress, $standard = null)
    {

      // Set the email address
      $this->_emailAddress = $emailAddress;

      // Turn on the RFC 5321 standard if requested
      if ($standard == 5321)
      {
        $this->setStandard5321();
      }

      // Otherwise turn on the RFC 5322 standard if requested
      if ($standard == 5322)
      {
        $this->setStandard5322();
      }

    }

    /**
     * Call the constructor fluently
     *
     * @access public
     * @static
     * @param string $emailAddress
     * @param integer $standard
     * @return \Models\Validators\EmailAddressValidator
     */

    public static function setEmailAddress($emailAddress, $standard = null)
    {
      return new self($emailAddress, $standard);
    }

    /**
     * Validate the email address according to RFC 5321 and return itself
     *
     * @access public
     * @param bool $allow
     * @return \Models\Validators\EmailAddressValidator
     */

    public function setStandard5321($allow = true)
    {

      // A quoted string local part is either allowed (true) or not (false)
      $this->_quotedString = $allow;

      // A domain literal domain is either allowed (true) or not (false)
      $this->_domainLiteral = $allow;

      // Return itself
      return $this;

    }

    /**
     * Validate the email address according to RFC 5322 and return itself
     *
     * @access public
     * @param bool $allow
     * @return \Models\Validators\EmailAddressValidator
     */

    public function setStandard5322($allow = true)
    {

      // An obsolete local part is either allowed (true) or not (false)
      $this->_obsolete = $allow;

      // A domain literal domain is either allowed (true) or not (false)
      $this->_domainLiteral = $allow;

      // Comments and folding white spaces are either allowed (true) or not (false)
      $this->_cfws = $allow;

      // Return itself
      return $this;

    }

    /**
     * Either allow (true) or disallow (false) a quoted string local part and return itself
     *
     * @access public
     * @param bool $allow
     * @return \Models\Validators\EmailAddressValidator
     */

    public function setQuotedString($allow = true)
    {

      // Either allow (true) or disallow (false) a quoted string local part
      $this->_quotedString = $allow;

      // Return itself
      return $this;

    }

    /**
     * Either allow (true) or disallow (false) an obsolete local part and return itself
     *
     * @access public
     * @param bool $allow
     * @return \Models\Validators\EmailAddressValidator
     */

    public function setObsolete($allow = true)
    {

      // Either allow (true) or disallow (false) an obsolete local part
      $this->_obsolete = $allow;

      // Return itself
      return $this;

    }

    /**
     * Either allow (true) or disallow (false) a domain literal domain and return itself
     *
     * @access public
     * @param bool $allow
     * @return \Models\Validators\EmailAddressValidator
     */

    public function setDomainLiteral($allow = true)
    {

      // Either allow (true) or disallow (false) a domain literal domain
      $this->_domainLiteral = $allow;

      // Return itself
      return $this;

    }

    /**
     * Either allow (true) or disallow (false) comments and folding white spaces and return itself
     *
     * @access public
     * @param bool $allow
     * @return \Models\Validators\EmailAddressValidator
     */

    public function setCFWS($allow = true)
    {

      // Either allow (true) or disallow (false) comments and folding white spaces
      $this->_cfws = $allow;

      // Return itself
      return $this;

    }

    /**
     * Return the regular expression for a dot atom local part
     *
     * @access private
     * @return string
     */

    private function _getDotAtom()
    {
      return "([!#-'*+\/-9=?^-~-]+)(?>\.(?1))*";
    }

    /**
     * Return the regular expression for a quoted string local part
     *
     * @access private
     * @return string
     */

    private function _getQuotedString()
    {
      return '"(?>[ !#-\[\]-~]|\\\[ -~])*"';
    }

    /**
     * Return the regular expression for an obsolete local part
     *
     * @access private
     * @return string
     */

    private function _getObsolete()
    {

      return '([!#-\'*+\/-9=?^-~-]+|"(?>'
        . $this->_getFWS()
        . '(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*'
        . $this->_getFWS()
        . '")(?>'
        . $this->_getCFWS()
        . '\.'
        . $this->_getCFWS()
        . '(?1))*';

    }

    /**
     * Return the regular expression for a domain name domain
     *
     * @access private
     * @return string
     */

    private function _getDomainName()
    {

      return '([a-z0-9](?>[a-z0-9-]*[a-z0-9])?)(?>'
        . $this->_getCFWS()
        . '\.'
        . $this->_getCFWS()
        . '(?2)){0,126}';

    }

    /**
     * Return the regular expression for an IPv6 address
     *
     * @access private
     * @return string
     */

    private function _getIPv6()
    {
      return '([a-f0-9]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f0-9][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?';
    }

    /**
     * Return the regular expression for an IPv4-mapped IPv6 address
     *
     * @access private
     * @return string
     */

    private function _getIPv6v4()
    {
      return '(?3)(?>:(?3)){5}:|(?!(?:.*[a-f0-9]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?';
    }

    /**
     * Return the regular expression for an IPv4 address
     *
     * @access private
     * @return string
     */

    private function _getIPv4()
    {
      return '(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?6)){3}';
    }

    /**
     * Return the regular expression for a domain literal domain
     *
     * @access private
     * @return string
     */

    private function _getDomainLiteral()
    {

      return '\[(?:(?>IPv6:(?>'
        . $this->_getIPv6()
        . '))|(?>(?>IPv6:(?>'
        . $this->_getIPv6v4()
        . '))?'
        . $this->_getIPv4()
        . '))\]';

    }

    /**
     * Return either the regular expression for folding white spaces or its backreference if allowed
     *
     * @access private
     * @var bool $define
     * @return string
     */

    private function _getFWS($define = false)
    {

      // Return the backreference if $define is set to false otherwise return the regular expression
      if ($this->_cfws)
      {
        return !$define ? '(?P>fws)' : '(?<fws>(?>[      ]+(?>\x0D\x0A[  ]+)*)?)';
      }

    }

    /**
     * Return the regular expression for comments
     *
     * @access private
     * @return string
     */

    private function _getComments()
    {

      return '(?<comment>\((?>'
        . $this->_getFWS()
        . '(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?P>comment)))*'
        . $this->_getFWS()
        . '\))';

    }

    /**
     * Return either the regular expression for comments and folding white spaces or its backreference if allowed
     *
     * @access private
     * @var bool $define
     * @return string
     */

    private function _getCFWS($define = false)
    {

      // Return the backreference if $define is set to false
      if ($this->_cfws && !$define)
      {
        return '(?P>cfws)';
      }

      // Otherwise return the regular expression
      if ($this->_cfws)
      {

        return '(?<cfws>(?>(?>(?>'
          . $this->_getFWS(true)
          . $this->_getComments()
          . ')+'
          . $this->_getFWS()
          . ')|'
          . $this->_getFWS()
          . ')?)';

      }

    }

    /**
     * Establish, and return, the valid format for the local part
     *
     * @access private
     * @return string
     */

    private function _getLocalPart()
    {

      // The local part may be obsolete if allowed
      if ($this->_obsolete)
      {
        return $this->_getObsolete();
      }

      // Or the local part may be either a dot atom or a quoted string if the latter is allowed
      if ($this->_quotedString)
      {
        return '(?>' . $this->_getDotAtom() . '|' . $this->_getQuotedString() . ')';
      }

      // Otherwise the local part may only be a dot atom
      return $this->_getDotAtom();

    }

    /**
     * Establish, and return, the valid format for the domain
     *
     * @access private
     * @return string
     */

    private function _getDomain()
    {

      // The domain may be either a domain name or a domain literal if the latter is allowed
      if ($this->_domainLiteral)
      {
        return '(?>' . $this->_getDomainName() . '|' . $this->_getDomainLiteral() . ')';
      }

      // Otherwise the domain must be a domain name
      return $this->_getDomainName();

    }

    /**
     * Check to see if the domain can be resolved to MX RRs
     *
     * @access private
     * @param array $domain
     * @return bool
     */

    private function _verifyDomain($domain)
    {

      // Return false if the domain cannot be resolved to MX RRs
      if (!checkdnsrr(end($domain), 'MX'))
      {
        return false;
      }

      // Otherwise return true
      return true;

    }

    /**
     * Perform the validation check on the email address's syntax and, if required, call _verifyDomain()
     *
     * @access public
     * @param bool $verify
     * @return bool|integer
     */


    public function validate($verify = false)
    {

      // Return false if the email address has an incorrect syntax
      if (!preg_match(

          '/^'
        . $this->_getCFWS()
        . $this->_getLocalPart()
        . $this->_getCFWS()
        . '@'
        . $this->_getCFWS()
        . $this->_getDomain()
        . $this->_getCFWS(true)
        . '$/isD'
        , $this->_emailAddress

      ))
      {
        return false;
      }

      // Otherwise check to see if the domain can be resolved to MX RRs if required
      if ($verify)
      {

        // Return 0 if the domain cannot be resolved to MX RRs
        if (!$this->_verifyDomain(explode('@', $this->_emailAddress)))
        {
          return 0;
        }

        // Otherwise return true
        return true;

      }

      // Otherwise return 1
      return 1;

    }

  }


Last edited by MichaelR on Sun Feb 20, 2011 4:59 am, edited 151 times in total.

Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Wed Nov 18, 2009 1:56 pm 
Offline
DevNet Master
User avatar

Joined: Sun Jan 21, 2007 12:06 am
Posts: 4135
Why is it that just regex is not enough...?


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Wed Nov 18, 2009 6:29 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
superdezign wrote:
Why is it that just regex is not enough...?


It can be enough. It just makes the regex even more complicated. I've updated it to include more in the preg_match. I could perhaps do it again to check for consecutive periods in the local-part and for consecutive hyphens in the domain name.

Edit: I've managed to simplify the regex dramatically even when including the extra functions.


Last edited by MichaelR on Thu Nov 19, 2009 9:26 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Thu Dec 17, 2009 8:05 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
Okay, I think this is as good as I'll get it. I've intentionally not allowed for comments or folding whitespace, because they're "semantically invisible", and also not allowed for obsolete text (as per RFC 5322), the latter of which I may allow at some point.

Note: Obsolete text is now allowed.


Last edited by MichaelR on Mon Feb 08, 2010 7:14 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 9:13 am 
Offline
DevNet Master

Joined: Wed Feb 11, 2004 4:23 pm
Posts: 4872
Location: Palm beach, Florida
Its horrendously complex and I got 3/5 false positives:

foo@example.foo.foo.bar.234234234234 # false positive
2@a.a # false positive
2@2 # false positive

Here's a few good ones from a library of REGEX that came with a program I bought (regexbuddy)

Email address: RFC 2822
This regular expression implements the official RFC 2822 standard for email addresses. Using this regular expression in actual applications is NOT recommended. It is shown to illustrate that with regular expressions there's always a trade-off between what's exact and what's practical.

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Email address: RFC 2822 (simplified)
Matches a normal email address. Does not check the top-level domain.
Requires the "case insensitive" option to be ON.

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 9:26 am 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
josh wrote:
foo@example.foo.foo.bar.234234234234 # false positive
2@a.a # false positive
2@2 # false positive


They are all perfectly valid (syntactically) email addresses. There is nothing in RFC 5322 or 5321 which prohibits all-numeric TLDs. And according to RFC 5321, single-label domain names (i.e. just the TLD) are acceptable: "In the case of a top-level domain used by itself in an email address, a single string is used without any dots". Single-letter TLDs are also valid. It's just the case that there are none in use. The purpose of the regex is not to check to see if the domain name is real (and can be used to receive email). It's purpose is only to see if the email address is syntactically valid. Checkdnsrr() is used to see if the domain name has MX records.

And the examples you gave do not allow for IPv6 addresses or for internationalized labels, do not limit the length of the local part, the domain part, or the entire email address, and allow consecutive hyphens in the domain name.


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 9:36 am 
Offline
DevNet Master

Joined: Wed Feb 11, 2004 4:23 pm
Posts: 4872
Location: Palm beach, Florida
Yeah but the IANA reserves single letter 2nd level domains, what makes me believe they are going to release single letter TLDs anytime soon

I guess your regex does work, although I think your example just proves regex is not ideal for everything. For validation such a complex pattern I would prefer a lexer. So I still think your regex is impossible to understand. It has a cyclomatic complexity of around 50 (if that metric can even be used for regex code).


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 12:17 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
True, it is complex, but as long as it works, you don't need to understand how exactly it functions. I use computers but don't understand their production.

You might also be interested in checking out my article on Email Address Validation. I explain piece-by-piece what part of the regex matches for what, and then provide a class which allows easy manipulation of what to include in the validation.


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 12:59 pm 
Offline
DevNet Master

Joined: Wed Feb 11, 2004 4:23 pm
Posts: 4872
Location: Palm beach, Florida
Now that it has documentation, my opinion took a huge 180. Great work.
Regex buddy shows me the structure *very* clearly but this makes it clear what each section actually does.


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 1:35 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
I've added a method to the class which allows you to toggle on and off strict TLD labels. If strict then it must be 2 - 6 characters in length and can only contain letters.


Last edited by MichaelR on Thu Sep 16, 2010 5:06 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 2:32 pm 
Offline
DevNet Master

Joined: Wed Feb 11, 2004 4:23 pm
Posts: 4872
Location: Palm beach, Florida
Ooh I didn't see the class. The class is very well done, you should consider submitting it to Zend framework


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Sun Dec 20, 2009 2:39 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
You think so? I'd probably have to comment all the little bits and pieces; something I've never been good at. I'll have to check out php Documentor first. That seems to be the standard.


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Tue Dec 22, 2009 3:23 pm 
Offline
Forum Contributor
User avatar

Joined: Thu May 11, 2006 8:58 pm
Posts: 305
Location: Utah, USA
What about email addresses that don't conform to the standard but still resolve via dns?

http ://to. is a freakish example. I heard somewhere it is a default page for the entire ".to" TLD and the dot at the end simply prevents your browser from trying to go to "to.com". In theory, wouldn't an email to "2@to" or "2@to." get delivered ok?


Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Tue Dec 22, 2009 3:40 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
tr0gd0rr wrote:
What about email addresses that don't conform to the standard but still resolve via dns?

http ://to. is a freakish example. I heard somewhere it is a default page for the entire ".to" TLD and the dot at the end simply prevents your browser from trying to go to "to.com". In theory, wouldn't an email to "2@to" or "2@to." get delivered ok?


That email address does conform to the standard, and is allowed by my regex because of that. Except without the final dot, of course, which isn't part of the hostname; it's just a "browser hack" to prevent browsers from auto-appending .com. For example: http://devnetwork.net. brings you to http://devnetwork.net

So, 2@to is valid (and is allowed by my regex), but 2@to. is not valid (and so is not allowed).


Last edited by MichaelR on Thu Sep 16, 2010 5:06 pm, edited 2 times in total.

Top
 Profile  
 
 Post subject: Re: E-mail validation
PostPosted: Wed Dec 23, 2009 5:55 pm 
Offline
Forum Contributor

Joined: Sat Jan 03, 2009 4:27 pm
Posts: 148
Here's a really interesting website I've found which allows you to see regular expressions visually: Strfriend. Posting my regex into the form makes it (a little) easier to see how it works.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 18 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group