removing characters using regular expressions

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

removing characters using regular expressions

Post by jasongr »

Hello

I need to write code that given a string, strips from it any of the following characters:
%, &, \, ", /, ?, ', <, >, :, *, |, ^, ,

(note that comma is also not permitted)

I know that I can achieve the following using this code

Code: Select all

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ",");			
foreach ($illegalCharacters as $character) {
	$text = str_replace($character, '', $text);
}
This seems like a very inefficient solution
Is there a better solution, one that possibly uses regular expression?

Note
I also need to strip the * and ? character
It is critical that their usage inside a regular expression will not lead to unexpected results,
stripping more characters than allowed, or missing a few ones from the list above

any help would be appreciated
thanks in advance
foobar
Forum Regular
Posts: 613
Joined: Wed Sep 28, 2005 10:08 am

Post by foobar »

Regex.

Code: Select all

$yourtext = preg_replace('/(\%|\&|\\|\"|\/|\?|\'\<\>|\:|\*|\||\^|,)*/', '', $yourtext);
Not tested, but should work.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ",");
$pattern = '#['.implode('',array_map(CreateFunction('$a','return preg_quote($a,\'#\');'),$illegalCharacters)).']+#';
$text = preg_replace($pattern, '', $text);
untested..
yum-jelly
Forum Commoner
Posts: 98
Joined: Sat Oct 29, 2005 9:16 pm

Post by yum-jelly »

strtr(), would also work...

Code: Select all

<?

$bad = array ( '%' => '',  '&' => '',  '\\' => '',  '"' => '',  '/' => '',  '?' => '',  '\'' => '',  '<' => '',  '>' => '',  ':' => '',  '*' => '',  '|' => '',  '^' => '',  ',' => '' );

$string = 'h%a*p|p,y B^i\\r&t:h/d>a<y t?o y"o\'u';

echo strtr ( $string, $bad );

?>

yj
User avatar
Jenk
DevNet Master
Posts: 3587
Joined: Mon Sep 19, 2005 6:24 am
Location: London

Post by Jenk »

Code: Select all

<?php

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ",");

$text = str_replace($illegalCharacters, '', $text);

?>
jasongr
Forum Contributor
Posts: 206
Joined: Tue Jul 27, 2004 6:19 am

Post by jasongr »

thanks for all the suggestions?
can anyone tell me how they all compare in terms of efficiency? (assuming they all work of course)
I need to decide what would be the best solution?
yum-jelly
Forum Commoner
Posts: 98
Joined: Sat Oct 29, 2005 9:16 pm

Post by yum-jelly »

It depends on your server and PHP version. It's always best to test...

On 3 of my servers (2) windows, 1 linux strtr() is almost 2 times faster than str_replace.

On 2 other linux servers it's 1 times faster than str_replace.

On my last server str_replace is a touch faster!


PHP 5 //testing...

test str_replace();

Code: Select all

<?

$s = microtime ( true );

$illegalCharacters = Array("%", "&", "\\", '"', "/", "?", "'", "<", ">", ":", "*", "|", "^", ","); 

$string = 'h%a*p|p,y B^i\\r&t:h/d>a<y t?o y"o\'u';

$i = 10000;

while ( --$i > 0 )
{
	str_replace($illegalCharacters, '', $string); 
}


$e = microtime ( true );

echo ( $e - $s );

?>
test strtr();

Code: Select all

<?

$s = microtime ( true );

$bad = array ( '%' => '', '&' => '', '\\' => '', '"' => '', '/' => '', '?' => '', '\'' => '', '<' => '', '>' => '', ':' => '', '*' => '', '|' => '', '^' => '', ',' => '' ); 

$string = 'h%a*p|p,y B^i\\r&t:h/d>a<y t?o y"o\'u';

$i = 10000;

while ( --$i > 0 )
{
	strtr ( $string, $bad ); 
}

$e = microtime ( true );

echo ( $e - $s );

?>

Testing is always a good idea!

yj
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

It'll be the str_replace() without the loop in all cases I'd say.

Are you sure it's *only* those characters you need to block? A regex could block out all special chars if you wanted instead.

Code: Select all

$str = preg_replace('/\W+/', '', $original); //This one will allow underscores

//or

$str = preg_replace('/[\W_]+/', '', $original); //This one only allows numbers and letters
Post Reply