Page 1 of 1

E-mail Validation

Posted: Tue Jul 29, 2008 9:59 pm
by Twayne
My question's at the bottom. But you need the preceding parts to understand why I'd ask such a question:

Woof! I've just come from a fairly thorough reading of many articles concerning e-mail validations and which ones work and which ones don't and why. It all started because an address of first.last//something@domain.tld failed my format validity test. Then I thought filter_validate_email might be the thing to use, but alas my ISP is one rev too old (5.2.2) so all I get is fatal errors as detailed all over the web I found to my disgust. Then to add insult to injury it turns out even that isn't really able to recognize all formats of the e-mail addresses. Add to the the upcoming new TLDs and it starts to get to be a real mess. I guess if my ISP were to go to 5.2.6 life would be easier, but ... .
Shortening the story, I eventually came across Simon Slick's site (simonslick.com) and there was at least a LOT Of good info and what appeared to be workable, inclusive scripts for email validation, complete with apparently all the relevant RFCs that were applicable.
After getting my head around that, I've finally come to the conclusion of "Big Deal; what's the point?" Basically, when making up a phoney address, as long as you don't have too many "@"s in it, its' no problem at all to create a phony address; so, why go through all those ereg et al gyrations to prove, what, that it doesn't start with a dot and a couple other minor things? That's no security; that's programming for the sake of programming, IMO, absolutely NO OFFENSE INTENDED to anyone involved in such complex assemblies; it's just my thought for the moment.
So what it boils down to for me is, screw the overall format validation. I'm just going to check for 1 @ and at least one dot, and then go right to an MX check. If the address at least has an @ and a dot, in the right order, and the MX exists, that feels just as secure as the hundreds of regular expression characters and so forth. And it's a lot less work and a lot less trouble-shooting if something goes wrong or maintenance is needed.

If you're only familair with name@domain.tld as an e-mail format, you're probably best advised to not bother with a response and instead consider this a learning experience. I assure you there are many, many other formats available that are completely functional. I had one recently come from iraq with a really strange address format. FWIW, I'm also a newbie and a lot of today's research was a total surprise to me!

So, here's my question:
Do the more experienced users here agree with me, or do you think I'm missing something important in deciding to only check for an @, then a dot, in the right order, and then going right to the mx check? Do I really need to do anything else?

Thanks for reading; I know it's a bit long winded.

Thanks for your consideration,

Twayne

Re: E-mail Validation

Posted: Tue Jul 29, 2008 11:45 pm
by omniuni
I believe the thing that is more of a problem is various injections. If you allow certain codes to be sent through as a valid eMail address, it may trigger unwanted results. This said, it comes to my mind that the following would probably be sufficient validation:

1. Valid eMail should have only one "@"
2. After the @, there should be no special characters but a hyphen, and at least one period
3. Before the @, letters, numbers, hyphens, periods are OK

That should alert you to anything strange, and of course, provide the necessary alert to the user if they enter it incorrectly.

Someone more learned please correct me if I'm missing something major.

PHOENIXHEART: Right, of course! :lol:

Re: E-mail Validation

Posted: Wed Jul 30, 2008 2:47 am
by Phoenixheart
And one more thing, the period should not come right after the @ sign :D

Re: E-mail Validation

Posted: Wed Jul 30, 2008 11:35 am
by Twayne
I believe the thing that is more of a problem is various injections.


For sure. IMO it's gotten to the point wher it's easier to test for what's NOT there as opposed to what IS there.
that the following would probably be sufficient validation:

1. Valid eMail should have only one "@"
2. After the @, there should be no special characters but a hyphen, and at least one period
3. Before the @, letters, numbers, hyphens, periods are OK

That should alert you to anything strange, and of course, provide the necessary alert to the user if they enter it incorrectly.
Thanks for the response; that's how I feel too but the more I look into this the more I begin to wonder. It's a lot like discovering there's no Santa Claus :^)

Surprisingly, taking #1, if I read the RFCs and Simon correctly, there actually can be multiple @ signs. In fact, the symbols that CAN be used were rather surprising to me; that's why I turned to the forum here. I haven't completed reading the RFCs for myself yet, but, based on the validation failure I already had and the following, it's a whole new world out there nowadays as the military e-mail from Iraq rather rudely showed me<g>.
------------------------
Local Part:
-------------------------
* Local Part (preceding the last '@' symbol; ASCII hex value 40)
* Domain Part (following the last '@' symbol; ASCII hex value 40)
o IP Address Literal (if used instead of a domain name)
+ IP v4 Address Literal
+ IP v6 Address Literal (full & compressed)
+ IP v6 v4 Address Literal (full & compressed)

1. A non-quoted local part may consist of alpha (a-z) (x61-x7A) (A-Z) (x41-x5A), numeric (0-9) (x30-x39) and the following characters: !#$%&'*+-/=?^_`{|}~ (x21, x23, x24, x25, x26, x27, x2A, x2B, x2D, x2F, x3D, x3F, x5E, x5F, x60, x7B, x7C, x7D, x7E) respectively. - RFC 3696 - 3 & errata, RFC 2822 - 3.2.4

2. Dots may also be present in the local part, but can not be the first nor last character, nor adjacent to another dot (.) (x2E). - RFC 3696 - 3, RFC 2822 - 3.2.4

3. The local part may be a double quoted (") (x22) string consisting of any ASCII characters except the following: NULL (x00), TAB (x09), LF (x0A), CR (x0D), " (x22), \ (x5C). However the following are permitted in a local part double quoted string if escaped (preceded by a backslash, (\), (x5C)): x01 thru x09, x0B, x0C, x0E thru x7F. - RFC 3696 - 3 & errata, RFC 2822 - 3.2.1 - 3.2.5

4. Maximum length of the local part is 64 characters. - RFC 3696 - 3 & errata, RFC 2821 - 4.5.3.1
-----------------

So in the end, I just figured hell, work out something simple, and you're right, check for newlines etc., and go right for the MX, not that that's any real proof either. As you can see there can even be multiple @ signs. I don't pretend to understand a quoted local part, but ... it will work, I tried it by creating such an address for myself.

Then you have to balance reality with what you'll actually find being used, but ... had I not heard from the guy in Iraq, I'd have had more confidence in setting it up. It makes me wonder how much trouble overseas folks may have due to these filters, and what else might be already in use that I'm not aware of? I didn't think multiple @s would ever happen either, but there turns out to be a lot of holes in what I don't know I know<g>.

Regards,

Twayne

Someone more learned please correct me if I'm missing something major.

PHOENIXHEART: Right, of course! :lol:[

Re: E-mail Validation

Posted: Wed Jul 30, 2008 1:42 pm
by omniuni
I'm so curious now. Can you tell us sort of what this eMail address this was?

what exactly was so unusual about it?

I've messed around in cPanel, and I have been unable to create an eMail address with @ or ~ or other special characters. Just "." and "-" and "_".