Page 1 of 1

Email extractor

Posted: Mon Jun 20, 2005 3:54 pm
by anjanesh
Im writing a code for a friend of mine which extracts emails from some file contents.

Code: Select all

/(\w+@\w+\.\w+)/i
works but for emails like someone@yahoo.co.uk it will return someone@yahoo.co

How do I add to the pattern such that there may or may not be a second dot(.) ?

Thanks

Posted: Mon Jun 20, 2005 4:08 pm
by pickle
I'm don't know where or when, but I'm sure this topic has come up before. Search the forums and I guarantee you'll find something.

Posted: Mon Jun 20, 2005 4:25 pm
by anjanesh
This did work though but not sure if this is the 'perfect' solution.

Code: Select all

/(\w+@\w+\.\w+(\.\w+)?)/i

Posted: Mon Jun 20, 2005 5:03 pm
by Skara

Code: Select all

/(\b\S+@(?:їa-z0-9-]+\.)+їa-z0-9\.-]+\b)/i
modified from my validation one. untested.
domain names cannot contain underscores, but can contain dashes.
usernames can contain weird characters as well, even @.

Posted: Mon Jun 20, 2005 5:15 pm
by Roja
Skara wrote: domain names cannot contain underscores, but can contain dashes.
While the topic is email, it is inaccurate to say that *domain names* cannot contain underscores. In fact the RFC's for domain names do allow them. *EMAIL* domains cannot contain underscores.

(However, it should be mentioned that all current versions of BIND by default do refuse to honor *domain names* with underscores, despite the RFC's allowing them.)

(Relevant RFC's include 1034, 1035, and 2821)

Posted: Mon Jun 20, 2005 11:05 pm
by nickvd
Untested and stolen from regexlib.com

Code: Select all

^ї\w]((ї_\.\-\+]?ї\w]+)*)@(ї\w]+)((ї\.-]?ї\w]+)*)\.(їA-Za-z]{2,})$