Dollar $ not really the end

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Dollar $ not really the end

Post by matthijs »

A while ago Stefan Esser wrote about holes in preg_match functions. As I had to do some regex stuff I remembered his post and thought it might be interesting for people here to read about it.

What it comes down to is that the dollar sign $ is used and advocated to be used as the end of a pattern to match. However, what many people don't know is that a single newline can be inserted after the last character. When you really need the end of the string, you have to use the /D modifier after the dollar sign. Please read his post for a better explanation.

A quick example:
filter.php

Code: Select all

<?php
$clean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/", $_GET['var'])) {
   $clean['var'] = $_GET['var'];
}
// filter.php?var=012345:XYZ%0a
echo '<br>Clean[\'var\'] is: ' . $clean['var']; echo 'test';
// Clean['var'] is: 012345:XYZ 
// test

$realclean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/D", $_GET['var'])) {
   $realclean['var'] = $_GET['var'];
}
echo '<br>RealClean[\'var\'] is: ' . $realclean['var']; echo 'test';
// RealClean['var'] is: test
You see? In the first test, the var passes the preg_match even though it shouldn't.

Maybe this is old stuff for some of you, but I didn't know it.

Weirdan| Corrected ouput
User avatar
GeertDD
Forum Contributor
Posts: 274
Joined: Sun Oct 22, 2006 1:47 am
Location: Belgium

Post by GeertDD »

Very true, indeed. This is overlooked often.

Note that instead of using $ in combination with the D modifier, you could also use the \z metacharacter.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

Thanks for the warning - I had no idea!
vapoorize
Forum Newbie
Posts: 22
Joined: Mon Dec 17, 2007 5:35 pm

Post by vapoorize »

What about /m ??

/m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string.
Post Reply