Page 1 of 1

Dollar $ not really the end

Posted: Sat Jan 05, 2008 3:49 am
by matthijs
A while ago Stefan Esser wrote about holes in preg_match functions. As I had to do some regex stuff I remembered his post and thought it might be interesting for people here to read about it.

What it comes down to is that the dollar sign $ is used and advocated to be used as the end of a pattern to match. However, what many people don't know is that a single newline can be inserted after the last character. When you really need the end of the string, you have to use the /D modifier after the dollar sign. Please read his post for a better explanation.

A quick example:
filter.php

Code: Select all

<?php
$clean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/", $_GET['var'])) {
   $clean['var'] = $_GET['var'];
}
// filter.php?var=012345:XYZ%0a
echo '<br>Clean[\'var\'] is: ' . $clean['var']; echo 'test';
// Clean['var'] is: 012345:XYZ 
// test

$realclean = array();
if (preg_match("/^[0-9]+:[X-Z]+$/D", $_GET['var'])) {
   $realclean['var'] = $_GET['var'];
}
echo '<br>RealClean[\'var\'] is: ' . $realclean['var']; echo 'test';
// RealClean['var'] is: test
You see? In the first test, the var passes the preg_match even though it shouldn't.

Maybe this is old stuff for some of you, but I didn't know it.

Weirdan| Corrected ouput

Posted: Sat Jan 05, 2008 9:18 am
by GeertDD
Very true, indeed. This is overlooked often.

Note that instead of using $ in combination with the D modifier, you could also use the \z metacharacter.

Posted: Sat Jan 05, 2008 7:20 pm
by Kieran Huggins
Thanks for the warning - I had no idea!

Posted: Mon Jan 07, 2008 1:38 pm
by vapoorize
What about /m ??

/m enables "multi-line mode". In this mode, the caret and dollar match before and after newlines in the subject string.