Page 1 of 1

how to traverse a string

Posted: Fri Nov 04, 2005 5:38 am
by jasongr
Hello

I have a string which contains characters of different encoding
I need to traverse it and to remove any character that is below 32 or above 127

I am trying the following code

Code: Select all

for ($i=0; $i<mb_strlen($file); $i++) {
	$ch = $file[$i];
	$value = ord($ch);
	if ($value < 32 || $value > 127) {						
		$file = mb_substr($file, 0, $i) . mb_substr ($file, ($i+1));
	}
}
For some reason, this code doesn't remove all the bad characters
The problem could be either in line:
$ch = $file[$i];
or in line
$file = mb_substr($file, 0, $i) . mb_substr ($file, ($i+1));

here is an example:
$file ="×

Posted: Fri Nov 04, 2005 6:18 am
by jasongr
I found the bug
The bug has nothing to do with encoding, but with advancing the iteration index

here is a correct version of the code:

Code: Select all

for ($i=0; $i<mb_strlen($file); ) {
	$ch = $file[$i];
	$value = ord($ch);
	if ($value < 32 || $value > 127) {						
		$file = mb_substr($file, 0, $i) . mb_substr ($file, ($i+1));
	}
	else {
		$i++;
	}
}

Posted: Fri Nov 04, 2005 7:21 am
by TJ
You could use a simple regular expression:

Code: Select all

$subject = "the quick BROWN FOX \t 012345 $ \r\nthis is the next line";
echo preg_replace('/[^\x20-\x7F]/','', $subject);
What this does is replace all characters that don't match the range 0x20 - 0x7F (32 - 127).

Posted: Fri Nov 04, 2005 8:59 am
by Weirdan
TJ wrote: What this does is replace all characters that don't match the range 0x20 - 0x7F (32 - 127).
preg_replace is not mb safe unless /u modifier is used and input string is in UTF-8 encoding

here's more info: http://us2.php.net/manual/en/reference. ... .php#58409

below should work as well

Posted: Fri Nov 04, 2005 10:13 am
by wtf

Code: Select all

$bad = array("b", "a", "d");

$where = str_replace($bad, "", $where);

Re: below should work as well

Posted: Fri Nov 04, 2005 10:35 am
by Chris Corbyn
wtf wrote:

Code: Select all

$bad = array("b", "a", "d");

$where = str_replace($bad, "", $where);
?????

Posted: Fri Nov 04, 2005 10:46 am
by yum-jelly
without the mb_ type functions!

Code: Select all

$str = ''; // string to strip...

$new = '';

for ( $i = 0; $i < strlen ( $str ); $i++ )
{
	$a = ord ( ( $b = substr ( $str, $i, 1 ) ) );

	if ( $a >= 128 )
	{
		$i++;
	}
	else if ( $a >= 32 )
	{
		$new .= $b;
	}
}

echo $new;
::EDIT::

made it not repeat code


yj