Page 1 of 1

removing characters using regular expressions

Posted: Wed Nov 02, 2005 7:44 am
by jasongr
Hello

I need to write code that given a string, strips from it any of the following characters:
%, &, \, ", /, ?, ', <, >, :, *, |, ^, ,

(note that comma is also not permitted)

I know that I can achieve the following using this code

Code: Select all

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ",");			
foreach ($illegalCharacters as $character) {
	$text = str_replace($character, '', $text);
}
This seems like a very inefficient solution
Is there a better solution, one that possibly uses regular expression?

Note
I also need to strip the * and ? character
It is critical that their usage inside a regular expression will not lead to unexpected results,
stripping more characters than allowed, or missing a few ones from the list above

any help would be appreciated
thanks in advance

Posted: Wed Nov 02, 2005 7:54 am
by foobar
Regex.

Code: Select all

$yourtext = preg_replace('/(\%|\&|\\|\"|\/|\?|\'\<\>|\:|\*|\||\^|,)*/', '', $yourtext);
Not tested, but should work.

Posted: Wed Nov 02, 2005 8:23 am
by feyd

Code: Select all

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ",");
$pattern = '#['.implode('',array_map(CreateFunction('$a','return preg_quote($a,\'#\');'),$illegalCharacters)).']+#';
$text = preg_replace($pattern, '', $text);
untested..

Posted: Wed Nov 02, 2005 8:41 am
by yum-jelly
strtr(), would also work...

Code: Select all

<?

$bad = array ( '%' => '',  '&' => '',  '\\' => '',  '"' => '',  '/' => '',  '?' => '',  '\'' => '',  '<' => '',  '>' => '',  ':' => '',  '*' => '',  '|' => '',  '^' => '',  ',' => '' );

$string = 'h%a*p|p,y B^i\\r&t:h/d>a<y t?o y"o\'u';

echo strtr ( $string, $bad );

?>

yj

Posted: Wed Nov 02, 2005 9:10 am
by Jenk

Code: Select all

<?php

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ",");

$text = str_replace($illegalCharacters, '', $text);

?>

Posted: Wed Nov 02, 2005 9:17 am
by jasongr
thanks for all the suggestions?
can anyone tell me how they all compare in terms of efficiency? (assuming they all work of course)
I need to decide what would be the best solution?

Posted: Wed Nov 02, 2005 9:36 am
by yum-jelly
It depends on your server and PHP version. It's always best to test...

On 3 of my servers (2) windows, 1 linux strtr() is almost 2 times faster than str_replace.

On 2 other linux servers it's 1 times faster than str_replace.

On my last server str_replace is a touch faster!


PHP 5 //testing...

test str_replace();

Code: Select all

<?

$s = microtime ( true );

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ","); 

$string = 'h%a*p|p,y B^i\\r&t:h/d>a<y t?o y"o\'u';

$i = 10000;

while ( --$i > 0 )
{
	str_replace($illegalCharacters, '', $string); 
}


$e = microtime ( true );

echo ( $e - $s );

?>
test strtr();

Code: Select all

<?

$s = microtime ( true );

$bad = array ( '%' => '', '&' => '', '\\' => '', '"' => '', '/' => '', '?' => '', '\'' => '', '<' => '', '>' => '', ':' => '', '*' => '', '|' => '', '^' => '', ',' => '' ); 

$string = 'h%a*p|p,y B^i\\r&t:h/d>a<y t?o y"o\'u';

$i = 10000;

while ( --$i > 0 )
{
	strtr ( $string, $bad ); 
}

$e = microtime ( true );

echo ( $e - $s );

?>

Testing is always a good idea!

yj

Posted: Wed Nov 02, 2005 3:38 pm
by Chris Corbyn
It'll be the str_replace() without the loop in all cases I'd say.

Are you sure it's *only* those characters you need to block? A regex could block out all special chars if you wanted instead.

Code: Select all

$str = preg_replace('/\W+/', '', $original); //This one will allow underscores

//or

$str = preg_replace('/[\W_]+/', '', $original); //This one only allows numbers and letters