pregex matching for valid email

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

User avatar
Heavy
Forum Contributor
Posts: 478
Joined: Sun Sep 22, 2002 7:36 am
Location: Viksjöfors, Hälsingland, Sweden
Contact:

pregex matching for valid email

Post by Heavy »

I wrote a regex matching command to check for valid email addresses.
It works fine, but I'm not sure whether what I think is a valid email address is correct:

Code: Select all

<?php
$boolValid = preg_match(	'/(?i)^(їa-z0-9.\-_]+@їa-z0-9.\-_]+)$/',$email);
?>
This matches any email adress containing (any case) alphanumeric, dot, hyphen and underscore. Are there more characters I should allow in this pattern?
Tubbietoeter
Forum Contributor
Posts: 149
Joined: Fri Mar 14, 2003 2:41 am
Location: Germany

Post by Tubbietoeter »

have a look at here:
http://de3.php.net/manual/en/function.preg-match.php

look at the user comments
User avatar
Heavy
Forum Contributor
Posts: 478
Joined: Sun Sep 22, 2002 7:36 am
Location: Viksjöfors, Hälsingland, Sweden
Contact:

Post by Heavy »

I made some modifications to the example shown in the comments.

Here is the result:

Code: Select all

<?php
function valid_email_syntax($email){
return(preg_match(	'/(?i)'.
					'^(([a-z0-9\-_]\.?)+[a-z0-9\-_]@'.
					'('.
						'('.
							'(([a-z0-9\-_])+\.)*(ad|ae|aero|af|ag|ai|al|am|an|ao|aq|ar|arpa|as|at|au|aw|az|'.
							'ba|bb|bd|be|bf|bg|bh|bi|biz|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|'.
							'ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|com|coop|cr|cs|cu|cv|cx|cy|cz|'.
							'de|dj|dk|dm|do|dz|ec|edu|ee|eg|eh|er|es|et|eu|'.
							'fi|fj|fk|fm|fo|fr|'.
							'ga|gb|gd|ge|gf|gh|gi|gl|gm|gn|gov|gp|gq|gr|gs|gt|gu|gw|gy|'.
							'hk|hm|hn|hr|ht|hu|id|ie|il|in|info|int|io|iq|ir|is|it|'.
							'jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|'.
							'la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|mg|mh|mil|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|museum|mv|mw|mx|my|mz|'.
							'na|name|nc|ne|net|nf|ng|ni|nl|no|np|nr|nt|nu|nz|om|org|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|pro|ps|pt|pw|py|qa|re|ro|ru|rw|'.
							'sa|sb|sc|sd|se|sg|sh|si|sj|sk|sl|sm|sn|so|sr|st|su|sv|sy|sz|tc|td|tf|tg|th|tj|tk|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|um|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw'.
							')$'.
						')'.
					')'.
					'|'.
					'(([0-9][0-9]?|[0-1][0-9][0-9]|[2][0-4][0-9]|[2][5][0-5])\.){3}([0-9][0-9]?|[0-1][0-9][0-9]|[2][0-4][0-9]|[2][5][0-5]))$/i',$email));
}

?>
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

i don't feel like searching for it. i posted a variation onf one in a perl book. ff the top of my head i remember the following: there's a rfc that has a list of all possible ending domains.

and this is what i used, forumlated based on the fact i wanna weed out obvious bad e-mails... it only allows what A-Za-z0-0_ (\w) . (\.) and hyphen in the address, which has to end with 2 or 3 letters

Code: Select all

preg_match('/^[\w\.\-]+@[\w\.\-]+\.\w\w\w?$/', $email)
btw: i do allow one thing you don't: CAPITAL LETTERS

and for future use, these will be really helpful:

PERL SHORTS
\w = [A-Zaz09_]
\s = ALL WHITE SPACE
\d = [0-9]

the same thing but with capital letters will give you the opposite of those. case DOES matter with perl shorts
User avatar
Heavy
Forum Contributor
Posts: 478
Joined: Sun Sep 22, 2002 7:36 am
Location: Viksjöfors, Hälsingland, Sweden
Contact:

Post by Heavy »

In humble response to m3rajk:

The "(?i)" part of my pattern makes it case insensitive...

I wanted to strike out all scandinavian characters from beeing matched. That's why I didn't use any \w or [:alpha:] or such locale dependent things. However, I don't know how locale dependent \w is.

I started out doing regex one month ago, so I am not yet very guru with it. But I have tested my pattern thoroughly and it pleases me for the moment. I have not yet found any bug in it.
Tell me if you do...

My pattern disallows email names that start or end with a dot and that applies to the domain name as well. I think it is good enough for me, but I agree I don't HAVE to test for existing top domain names. As well, testing for existing top domain names makes me need to be updated on new top domain names when they arrive. I might remove it, but it works right now...
User avatar
mikusan
Forum Contributor
Posts: 247
Joined: Thu May 01, 2003 1:48 pm

Post by mikusan »

While on topic could anyone help me with mine:
For some obscure reason i cannot figure out why it will not accept emails that are like abc@123.server.ca
I have added some trash but i still can't get it working...thanks for the help!!

Code: Select all

$str = ereg_replace( "^[0-9a-z]([-_.]?[0-9a-z])*@[0-9a-z]([-.]?[0-9a-z])*[0-9a-z]([-.]?[0-9a-z])*.[a-z]{3,4}$", "<a href="mailto:\\0">\\0</a>", $str );
User avatar
mikusan
Forum Contributor
Posts: 247
Joined: Thu May 01, 2003 1:48 pm

Post by mikusan »

yes i feel foolish to say... i am trying to do exactly what...well PHPBB just did up top ;)
User avatar
Heavy
Forum Contributor
Posts: 478
Joined: Sun Sep 22, 2002 7:36 am
Location: Viksjöfors, Hälsingland, Sweden
Contact:

Post by Heavy »

mikusan wrote:yes i feel foolish to say... i am trying to do exactly what...well PHPBB just did up top ;)
I don't understand that at all... What do you mean?
User avatar
Heavy
Forum Contributor
Posts: 478
Joined: Sun Sep 22, 2002 7:36 am
Location: Viksjöfors, Hälsingland, Sweden
Contact:

Post by Heavy »

Doh! I didn't see your first post!

when I see this (and have'nt tested it myself though):

Code: Select all

<?php
 "^[0-9a-z]([-_.]?[0-9a-z])*@[0-9a-z]([-.]?[0-9a-z])*[0-9a-z]([-.]?[0-9a-z])*.[a-z]{3,4}$"
?>
I think:
Shouldn't that dot be escaped?

Try:

Code: Select all

<?php
"^[0-9a-z]([-_\.]?[0-9a-z])*@[0-9a-z]([-\.]?[0-9a-z])*[0-9a-z]([-\.]?[0-9a-z])*\.?[a-z]{3,4}$"
?>
User avatar
mikusan
Forum Contributor
Posts: 247
Joined: Thu May 01, 2003 1:48 pm

Post by mikusan »

Nope dots are to be escaped only when you use single quotes, also my regex works fine with emails like me@123.com but not with me@123.whatever.co.uk... i would be happy with me@123.something.something....
McGruff
DevNet Master
Posts: 2893
Joined: Thu Jan 30, 2003 8:26 pm
Location: Glasgow, Scotland

Post by McGruff »

I'm not too hot on regex but this tool might help with refining your expressions, by making it quicker to test ideas:

http://www.weitz.de/regex-coach/#install
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

Heavy wrote:In humble response to m3rajk:

The "(?i)" part of my pattern makes it case insensitive...

I wanted to strike out all scandinavian characters from beeing matched. That's why I didn't use any \w or [:alpha:] or such locale dependent things. However, I don't know how locale dependent \w is.

I started out doing regex one month ago, so I am not yet very guru with it. But I have tested my pattern thoroughly and it pleases me for the moment. I have not yet found any bug in it.
Tell me if you do...

My pattern disallows email names that start or end with a dot and that applies to the domain name as well. I think it is good enough for me, but I agree I don't HAVE to test for existing top domain names. As well, testing for existing top domain names makes me need to be updated on new top domain names when they arrive. I might remove it, but it works right now...
[:alpha:] is purely posix. and \w is perl. perl is not local dependant. the shorts pertain to ascii stretches. it will only get what i mentioned. and ? is not needed after the end delimiter, and i'm not sure that it'll be case insensitive. remember, perl and posix are completely differnet. what you use for posix may not work for perl and visa versa. for case insenitivity in perl, using | as a delimiter, with global search/replacement, it's |pattern|gi or |pattern|ig

mine does NEARLY the same as yours. it doesn't stop the name from starting with a . but it's legal to start an e-mail address with a . not common but legal, and you can prevent that and a hypen by adding \w to mine to get

Code: Select all

preg_match('/^\w[\w\.\-]+@[\w\.\-]+\.\w\w\w?$/', $email)
giving you everything but the top level

mikusan: i'd suggest using perl for this. it's MUCH more elegant than posix.

just use my match check. it will allow that.
breaking down yours it's
any character 0-9 or a-z (not A-Z)
-_. optional but must have the same pattern as the first line 0 or more times
@
some weird thing based on the first half

and that's a replace... without capturing anything.

tell us what you're trying to do, until then the only good advice anyone can give you is to use my match
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

mikusan wrote:While on topic could anyone help me with mine:
For some obscure reason i cannot figure out why it will not accept emails that are like abc@123.server.ca
I have added some trash but i still can't get it working...thanks for the help!!

Code: Select all

$str = ereg_replace( "^[0-9a-z]([-_.]?[0-9a-z])*@[0-9a-z]([-.]?[0-9a-z])*[0-9a-z]([-.]?[0-9a-z])*.[a-z]{3,4}$", "<a href="mailto:\\0">\\0</a>", $str );
Your {3,4} stops it from matching all the two character country codes. It only matches the org/com/mil/net/info/coop/edu type top levels.
m3rajk
DevNet Resident
Posts: 1191
Joined: Mon Jun 02, 2003 3:37 pm

Post by m3rajk »

i figured i don't need to care about coop, info or museum becuase i've never seen anyone use them. and all the museums i know are all .org.
User avatar
Heavy
Forum Contributor
Posts: 478
Joined: Sun Sep 22, 2002 7:36 am
Location: Viksjöfors, Hälsingland, Sweden
Contact:

Post by Heavy »

m3rajk wrote:and i'm not sure that it'll be case insensitive
Hmm... But it works well...
I got it from here:
http://www.php.net/manual/en/pcre.pattern.syntax.php

Read about:
Internal option setting



I got the [:alpha:] from some tutorial, but won't use it again since it turned out to match scandinavian characters differently depending on installed locale. I will check out how locale dependent \w is.

Thanks for the tips though. :wink: I'm really newbie on regex, and have only used it with PHP so far.
Post Reply