Page 1 of 1
remove non-alphanum characters from string
Posted: Wed Feb 08, 2006 6:45 am
by ed209
Hi,
I'm stuck with getting a nice string out the other end of this function. I'm using it to create a file name so I only want alphanumeric characters to be used.
The $string comes from something that a user might input - which may contain ", ; : @ etc. In the place of these I want to use "_".
I have a function that works, but I can't figure out how to remove excessive ______
Code: Select all
<?php
function removeNonAlphaNum($string){
//$string = stripslashes($string);
$previous_i = 0;
$string_length = strlen($string);
$returned_string = "";
for( $i = 0 ; $i <= $string_length; $i++){
$sub_string = substr($string, $previous_i, 1);
$returned_string .= (ctype_alnum($sub_string)) ? $sub_string : "_";
$previous_i += 1;
}
return $returned_string;
}
echo removeNonAlphaNum("I am a non-alphaNumeric string £232!@£$$...");
// outputs
// I_am_a_non_alphaNumeric_string__232_________
//I want it to output
//I_am_a_non_alphaNumeric_string_232
?>
any ideas?
Posted: Wed Feb 08, 2006 6:53 am
by JayBird
not a direct answer to your question, but something like this for removing the characters may have been neater
Code: Select all
$string = "I am a non-alphaNumeric string £232!@£$$...";
$new_string = ereg_replace("[^A-Za-z0-9]", "_", $string);
echo $new_string;
Posted: Wed Feb 08, 2006 7:31 am
by ed209
Thanks for that, yours is quicker too.
for 1000 executions:
Time : 0.6401 seconds (mine)
Time : 0.0542 seconds (yours)
But I still have the problem of too many '_____' . Is there a way to only ever have one '_' in a row?
Thanks,
ed.
Posted: Wed Feb 08, 2006 8:14 am
by JayBird
this nearly works, but i leaves an undescore on the end
Code: Select all
$string = "I am a non-alphaNumeric string £232!@£$$...";
$new_string = ereg_replace("[^A-Za-z0-9]", "_", $string);
$final_string = ereg_replace("[_]+", "_", $new_string);
echo $final_string;
Posted: Wed Feb 08, 2006 8:14 am
by Benjamin
Code: Select all
function cleanup_filename($filename_to_clean)
{
// we use dashes because underscores will not wrap in a table!!!!
// the following array contains everything we want to remove from a color field
$invalid_in_filename = array("#", "~", "`", "!", "@", "$", "%", "^", "&", "*", "(", ")", "=", "+", "<", ",", ">", "/", "?", "\"", "'", ";", ":", "{", "[", "]", "}", "|", "\\");
//remove invalid characters
$replace_with = "";
$filename_to_clean = str_replace($invalid_in_filename,$replace_with,$filename_to_clean);
// convert underscores to dashes
$filename_to_clean = str_replace("_","-",$filename_to_clean);
// convert spaces to dashes
$filename_to_clean = str_replace(" ","-",$filename_to_clean);
// get rid of multiple dashes (i.e. "--")
while (substr_count($filename_to_clean, "--") > 0)
{
$filename_to_clean = str_replace("--","-",$filename_to_clean);
}
return $filename_to_clean;
}
Posted: Wed Feb 08, 2006 8:17 am
by JayBird
The above function will still leave an eronous dash on the end of the filename.
Dunno if that is acceptable for your application or not.
Also, the above function lists what isn't allowed in the filename...better to specify what IS allowed IMO
Posted: Wed Feb 08, 2006 8:23 am
by Benjamin
It's just something I had laying around so I figured I would throw it up there.
Getting rid of the last character is easy...
Code: Select all
$trimmed = rtrim($text, "..\-..\_");
Not sure if that code is right but it's close to that.
Posted: Wed Feb 08, 2006 8:24 am
by ed209
thanks for your help, problem solved.
I'll give them both a go. The file name isn't the end of the world, it just needs to resemble the title of the page.
Posted: Wed Feb 08, 2006 9:07 am
by feyd
psst. preg_ would run faster!

Posted: Wed Feb 08, 2006 9:12 am
by JayBird
agtlewis wrote:It's just something I had laying around so I figured I would throw it up there.
Getting rid of the last character is easy...
Code: Select all
$trimmed = rtrim($text, "..\-..\_");
Not sure if that code is right but it's close to that.
It would be a little more than that, becuase you would only want to remove the last character if the last charater was an underscore
Posted: Wed Feb 08, 2006 9:23 am
by Benjamin
That is what rtrim does according to what I understood from the documentation. You supply a list of characters to strip.
Posted: Wed Feb 08, 2006 9:28 am
by feyd
here's the preg version of pimp's last submission with the cleaning as requested:
Code: Select all
$new_string = rtrim(preg_replace("#[^a-z0-9]+#", "_", $string),'_');
Posted: Wed Feb 08, 2006 12:46 pm
by Christopher
I would recommend the preg "remove characters not in set" method (e.g. preg_replace('/[^a-z0-9]/', '', $mystring)) of specifying the characters that you want rather than attempting to remove bad characters. The latter method inevitably misses something.