PHP, regex, and quotes

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
dhinge
Forum Newbie
Posts: 12
Joined: Mon Jan 08, 2007 1:56 pm

PHP, regex, and quotes

Post by dhinge »

I've found some pretty odd behaviour with PHP's regex and quotes... maybe someone can explain this.

Basically I'm trying to build an email address validator that matches up to RFC 2822. For example, any character can start an email address except for a . (period). So my regex looks like this:

ereg("^[a-zA-Z0-9!#\$%\*/\?\|\^\{\}`~&'\+-=_]", '.')

That returns true, and if I remove the minus sign (-), it returns false. Why is that?

On another note, I was messing around and tried this statement:

ereg("^[\\]", '\')

and got this:

Warning: Unexpected character in input: ''' (ASCII=39) state=1
Warning: Unexpected character in input: '\' (ASCII=92) state=1
Warning: Unexpected character in input: ''' (ASCII=39) state=1
Parse error: syntax error, unexpected ')'

Aren't characters in single-quotes considered literal by PHP?

Using ereg("^[\\]", "\") gives Parse error: syntax error, unexpected T_VARIABLE, which is what I would expect.

What is the explanation for all this behaviour?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: PHP, regex, and quotes

Post by prometheuzz »

dhinge wrote:...
For example, any character can start an email address except for a . (period).
That means an email address can start with a white space character, or an '@'?
dhinge wrote:So my regex looks like this:

ereg("^[a-zA-Z0-9!#\$%\*/\?\|\^\{\}`~&'\+-=_]", '.')

That returns true, and if I remove the minus sign (-), it returns false. Why is that?
I see a lot of minus signs, so you'll need to explain yourself a bit more. It would greatly help if you post the code you're running.
Also, there is no need to escape the normal meta characters like '$', '*', '?', etc. inside a character class. The only chars you need to escape inside a character class are '[', ']', '^' (if used at the start of the character class) and '-' (if not at the start or end of the character class).
dhinge wrote:On another note, I was messing around and tried this statement:

ereg("^[\\]", '\')

and got this:

Warning: Unexpected character in input: ''' (ASCII=39) state=1
Warning: Unexpected character in input: '\' (ASCII=92) state=1
Warning: Unexpected character in input: ''' (ASCII=39) state=1
Parse error: syntax error, unexpected ')'

Aren't characters in single-quotes considered literal by PHP?

Using ereg("^[\\]", "\") gives Parse error: syntax error, unexpected T_VARIABLE, which is what I would expect.

What is the explanation for all this behaviour?
Could you post the code which produced that output?
User avatar
prometheuzz
Forum Regular
Posts: 779
Joined: Fri Apr 04, 2008 5:51 am

Re: PHP, regex, and quotes

Post by prometheuzz »

Note that matching an e-mail address according the RFC is madness: it would take a truly monstrous regex. Read the following article to convince yourself:
http://www.regular-expressions.info/email.html
Post Reply