remove non-alphanum characters from string

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

remove non-alphanum characters from string

Post by ed209 »

Hi,

I'm stuck with getting a nice string out the other end of this function. I'm using it to create a file name so I only want alphanumeric characters to be used.

The $string comes from something that a user might input - which may contain ", ; : @ etc. In the place of these I want to use "_".

I have a function that works, but I can't figure out how to remove excessive ______

Code: Select all

 
<?php
 
function removeNonAlphaNum($string){
    
    //$string = stripslashes($string);
 
    $previous_i = 0;
    $string_length = strlen($string);
 
    $returned_string = "";
    
    for( $i = 0 ; $i <= $string_length; $i++){
        $sub_string = substr($string, $previous_i, 1);
        $returned_string .= (ctype_alnum($sub_string)) ? $sub_string : "_";
        $previous_i += 1;
    }
    
    return $returned_string;
 
}
 
echo removeNonAlphaNum("I am a non-alphaNumeric string £232!@£$$...");
// outputs 
// I_am_a_non_alphaNumeric_string__232_________
 
//I want it to output
//I_am_a_non_alphaNumeric_string_232
 
?>
 
any ideas?
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

not a direct answer to your question, but something like this for removing the characters may have been neater

Code: Select all

 
$string = "I am a non-alphaNumeric string £232!@£$$...";
 
$new_string = ereg_replace("[^A-Za-z0-9]", "_", $string);
 
echo $new_string;
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

Thanks for that, yours is quicker too.

for 1000 executions:
Time : 0.6401 seconds (mine)
Time : 0.0542 seconds (yours)


But I still have the problem of too many '_____' . Is there a way to only ever have one '_' in a row?

Thanks,
ed.
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

this nearly works, but i leaves an undescore on the end

Code: Select all

$string = "I am a non-alphaNumeric string £232!@£$$...";
 
$new_string = ereg_replace("[^A-Za-z0-9]", "_", $string);
 
$final_string = ereg_replace("[_]+", "_", $new_string);
 
echo $final_string;
Last edited by JayBird on Wed Feb 08, 2006 8:14 am, edited 1 time in total.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Code: Select all

function cleanup_filename($filename_to_clean)
 
  {
 
  // we use dashes because underscores will not wrap in a table!!!!
 
  // the following array contains everything we want to remove from a color field
 
  $invalid_in_filename = array("#", "~", "`", "!", "@", "$", "%", "^", "&", "*", "(", ")", "=", "+", "<", ",", ">", "/", "?", "\"", "'", ";", ":", "{", "[", "]", "}", "|", "\\");
 
  //remove invalid characters
 
  $replace_with = "";
 
  $filename_to_clean = str_replace($invalid_in_filename,$replace_with,$filename_to_clean);
 
  // convert underscores to dashes
 
  $filename_to_clean = str_replace("_","-",$filename_to_clean);
 
  // convert spaces to dashes
 
  $filename_to_clean = str_replace(" ","-",$filename_to_clean);
 
  // get rid of multiple dashes (i.e. "--")
 
  while (substr_count($filename_to_clean, "--") > 0)
 
    {
 
    $filename_to_clean = str_replace("--","-",$filename_to_clean);
 
    } 
 
  return $filename_to_clean;
 
  }
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

The above function will still leave an eronous dash on the end of the filename.

Dunno if that is acceptable for your application or not.

Also, the above function lists what isn't allowed in the filename...better to specify what IS allowed IMO
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

It's just something I had laying around so I figured I would throw it up there.

Getting rid of the last character is easy...

Code: Select all

 
$trimmed = rtrim($text, "..\-..\_");
 
Not sure if that code is right but it's close to that.
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

thanks for your help, problem solved.

I'll give them both a go. The file name isn't the end of the world, it just needs to resemble the title of the page.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

psst. preg_ would run faster! :)
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

agtlewis wrote:It's just something I had laying around so I figured I would throw it up there.

Getting rid of the last character is easy...

Code: Select all

$trimmed = rtrim($text, "..\-..\_");
Not sure if that code is right but it's close to that.
It would be a little more than that, becuase you would only want to remove the last character if the last charater was an underscore
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

That is what rtrim does according to what I understood from the documentation. You supply a list of characters to strip.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

here's the preg version of pimp's last submission with the cleaning as requested:

Code: Select all

$new_string = rtrim(preg_replace("#[^a-z0-9]+#", "_", $string),'_');
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

I would recommend the preg "remove characters not in set" method (e.g. preg_replace('/[^a-z0-9]/', '', $mystring)) of specifying the characters that you want rather than attempting to remove bad characters. The latter method inevitably misses something.
(#10850)
Post Reply