Page 1 of 1
Email extractor
Posted: Mon Jun 20, 2005 3:54 pm
by anjanesh
Im writing a code for a friend of mine which extracts emails from some file contents.
works but for emails like
someone@yahoo.co.uk it will return
someone@yahoo.co
How do I add to the pattern such that there may or may not be a second dot(.) ?
Thanks
Posted: Mon Jun 20, 2005 4:08 pm
by pickle
I'm don't know where or when, but I'm sure this topic has come up before. Search the forums and I guarantee you'll find something.
Posted: Mon Jun 20, 2005 4:25 pm
by anjanesh
This did work though but not sure if this is the 'perfect' solution.
Posted: Mon Jun 20, 2005 5:03 pm
by Skara
Code: Select all
/(\b\S+@(?:їa-z0-9-]+\.)+їa-z0-9\.-]+\b)/i
modified from my validation one. untested.
domain names cannot contain underscores, but can contain dashes.
usernames can contain weird characters as well, even @.
Posted: Mon Jun 20, 2005 5:15 pm
by Roja
Skara wrote:
domain names cannot contain underscores, but can contain dashes.
While the topic is email, it is inaccurate to say that *domain names* cannot contain underscores. In fact the RFC's for domain names do allow them. *EMAIL* domains cannot contain underscores.
(However, it should be mentioned that all current versions of BIND by default do refuse to honor *domain names* with underscores, despite the RFC's allowing them.)
(Relevant RFC's include
1034,
1035, and
2821)
Posted: Mon Jun 20, 2005 11:05 pm
by nickvd
Untested and stolen from regexlib.com
Code: Select all
^ї\w]((ї_\.\-\+]?ї\w]+)*)@(ї\w]+)((ї\.-]?ї\w]+)*)\.(їA-Za-z]{2,})$