REGEX questions

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
paulstanely45
Forum Newbie
Posts: 8
Joined: Mon Nov 02, 2009 11:31 pm

REGEX questions

Post by paulstanely45 »

Hello,

Just a quick question for my own curiosity mostly. Hopefully someone can shed light on this.

I have this code

Code: Select all

<?php
$string = "breaks\kristin.txt";


preg_match("/breaks\\\([0-9A-Za-z\-_]+)\.txt/i", $string, $matches);

if(isset($matches[1])) {
echo $matches[1];	
}

?>
Which of course echos out 'kristin', as it should.

My big question is why I need the triple backslash? I thought I would only need two, one back slash to escape the other.

However, if I use this code:

Code: Select all

<?php
$string = "breaks\kristin.txt";


preg_match("/breaks\\([0-9A-Za-z\-_]+)\.txt/i", $string, $matches);

if(isset($matches[1])) {
	echo $matches[1];	
}

?>
I get the following error:

Warning: preg_match(): Compilation failed: unmatched parentheses at offset 23 in /Volumes/DATA1/webserver/sandbox/break.php on line 5


Could someone explain? I would think that only two would be needed, wouldn't 3 backslashes leave one escaped, and another one escaping the opening parenthesis?
User avatar
AbraCadaver
DevNet Master
Posts: 2572
Joined: Mon Feb 24, 2003 10:12 am
Location: The Republic of Texas
Contact:

Re: REGEX questions

Post by AbraCadaver »

PHP does not require all backslashes in strings to be escaped. If you want to include a backslash as a literal character in a PHP string, you only need to escape it if it is followed by another character that needs to be escaped. So in your example, the ( would need to be escaped if you meant a literal ( so \( would escape it, but when you do \\ the second \ doesn't need to be escaped so the first is treated as a literal \ and the second escapes the (, so to remove the ambiguity you use \\\.

Hope that makes sense.
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: REGEX questions

Post by Jonah Bron »

Okay, first I'd like to say I was very surprised at this, and for the first few minutes, it totally stumped me... but I think I have it figured out now. The problem is that there's two escapes going on: string, and regex. Here's an example:

Code: Select all

$string = "\\";
echo $string; // output: "\"

$regex = '/\/';
echo $regex; // output: "/\/"
preg_match($regex, $string, $matches); // regex parse error, no ending delimiter found.  \ escaped /

$regex = '/\\/';
echo $regex; // output: "/\/"
preg_match($regex, $string, $matches); // regex parse error, no ending delimiter found.  \ escaped /

$regex = '/\\\/';
echo $regex; // output: "\//\"
preg_match($regex, $string, $matches); // successful, \ escaped \
Do you see how that works now?

@AbraCadaver: I think it's because there's two escapes happening.
paulstanely45
Forum Newbie
Posts: 8
Joined: Mon Nov 02, 2009 11:31 pm

Re: REGEX questions

Post by paulstanely45 »

AbraCadaver wrote:PHP does not require all backslashes in strings to be escaped. If you want to include a backslash as a literal character in a PHP string, you only need to escape it if it is followed by another character that needs to be escaped. So in your example, the ( would need to be escaped if you meant a literal ( so \( would escape it, but when you do \\ the second \ doesn't need to be escaped so the first is treated as a literal \ and the second escapes the (, so to remove the ambiguity you use \\\.

Hope that makes sense.
I think that makes sense.

It seems to me that what needs to be passed to the preg_match function is a string that would actually evaluate to

Code: Select all

"/breaks\\([0-9A-Za-z\-_]+)\.txt/i"
because the regex engine wants to see a double backslash to know that the backslash there is supposed to be a literal. A triple backslash in a PHP string of course evaluates to a literal double backslash.

I think that's right. Please do correct me if I am wrong.
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Re: REGEX questions

Post by Jonah Bron »

Yes, that's right. You need three because the string escapes it, and then the regex escapes it. Here's the process:

Code: Select all

string input: "/breaks\\\([0-9a-z-_]+)\.txt/i"
Double backslash escapes to one:
        vv
"/breaks\\\([0-9a-z-_]+)\.txt/i"
"/breaks\\([0-9a-z-_]+)\.txt/i"

No more escapes

regex input: "/breaks\\([0-9a-z-_]+)\.txt/i"
double backslash escapes to one:
        vv
"/breaks\\([0-9a-z-_]+)\.txt/i"
"/breaks\([0-9a-z-_]+)\.txt/i"

backslash escapes dot to literal dot
...
Post Reply