Email extractor

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Email extractor

Post by anjanesh »

Im writing a code for a friend of mine which extracts emails from some file contents.

Code: Select all

/(\w+@\w+\.\w+)/i
works but for emails like someone@yahoo.co.uk it will return someone@yahoo.co

How do I add to the pattern such that there may or may not be a second dot(.) ?

Thanks
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Post by pickle »

I'm don't know where or when, but I'm sure this topic has come up before. Search the forums and I guarantee you'll find something.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Post by anjanesh »

This did work though but not sure if this is the 'perfect' solution.

Code: Select all

/(\w+@\w+\.\w+(\.\w+)?)/i
User avatar
Skara
Forum Regular
Posts: 703
Joined: Sat Mar 12, 2005 7:13 pm
Location: US

Post by Skara »

Code: Select all

/(\b\S+@(?:їa-z0-9-]+\.)+їa-z0-9\.-]+\b)/i
modified from my validation one. untested.
domain names cannot contain underscores, but can contain dashes.
usernames can contain weird characters as well, even @.
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

Skara wrote: domain names cannot contain underscores, but can contain dashes.
While the topic is email, it is inaccurate to say that *domain names* cannot contain underscores. In fact the RFC's for domain names do allow them. *EMAIL* domains cannot contain underscores.

(However, it should be mentioned that all current versions of BIND by default do refuse to honor *domain names* with underscores, despite the RFC's allowing them.)

(Relevant RFC's include 1034, 1035, and 2821)
nickvd
DevNet Resident
Posts: 1027
Joined: Thu Mar 10, 2005 5:27 pm
Location: Southern Ontario
Contact:

Post by nickvd »

Untested and stolen from regexlib.com

Code: Select all

^ї\w]((ї_\.\-\+]?ї\w]+)*)@(ї\w]+)((ї\.-]?ї\w]+)*)\.(їA-Za-z]{2,})$
Post Reply