Page 1 of 1

PHP, regex, and quotes

Posted: Wed Jul 23, 2008 11:51 am
by dhinge
I've found some pretty odd behaviour with PHP's regex and quotes... maybe someone can explain this.

Basically I'm trying to build an email address validator that matches up to RFC 2822. For example, any character can start an email address except for a . (period). So my regex looks like this:

ereg("^[a-zA-Z0-9!#\$%\*/\?\|\^\{\}`~&'\+-=_]", '.')

That returns true, and if I remove the minus sign (-), it returns false. Why is that?

On another note, I was messing around and tried this statement:

ereg("^[\\]", '\')

and got this:

Warning: Unexpected character in input: ''' (ASCII=39) state=1
Warning: Unexpected character in input: '\' (ASCII=92) state=1
Warning: Unexpected character in input: ''' (ASCII=39) state=1
Parse error: syntax error, unexpected ')'

Aren't characters in single-quotes considered literal by PHP?

Using ereg("^[\\]", "\") gives Parse error: syntax error, unexpected T_VARIABLE, which is what I would expect.

What is the explanation for all this behaviour?

Re: PHP, regex, and quotes

Posted: Wed Jul 23, 2008 2:33 pm
by prometheuzz
dhinge wrote:...
For example, any character can start an email address except for a . (period).
That means an email address can start with a white space character, or an '@'?
dhinge wrote:So my regex looks like this:

ereg("^[a-zA-Z0-9!#\$%\*/\?\|\^\{\}`~&'\+-=_]", '.')

That returns true, and if I remove the minus sign (-), it returns false. Why is that?
I see a lot of minus signs, so you'll need to explain yourself a bit more. It would greatly help if you post the code you're running.
Also, there is no need to escape the normal meta characters like '$', '*', '?', etc. inside a character class. The only chars you need to escape inside a character class are '[', ']', '^' (if used at the start of the character class) and '-' (if not at the start or end of the character class).
dhinge wrote:On another note, I was messing around and tried this statement:

ereg("^[\\]", '\')

and got this:

Warning: Unexpected character in input: ''' (ASCII=39) state=1
Warning: Unexpected character in input: '\' (ASCII=92) state=1
Warning: Unexpected character in input: ''' (ASCII=39) state=1
Parse error: syntax error, unexpected ')'

Aren't characters in single-quotes considered literal by PHP?

Using ereg("^[\\]", "\") gives Parse error: syntax error, unexpected T_VARIABLE, which is what I would expect.

What is the explanation for all this behaviour?
Could you post the code which produced that output?

Re: PHP, regex, and quotes

Posted: Wed Jul 23, 2008 2:43 pm
by prometheuzz
Note that matching an e-mail address according the RFC is madness: it would take a truly monstrous regex. Read the following article to convince yourself:
http://www.regular-expressions.info/email.html