Page 1 of 1
[Solved] Regex to remove invalid Windows filesystem chars
Posted: Sun Jul 23, 2006 4:54 am
by daedalus__
I'm working on an error handling class, which I will be making a topic about later (look for it in theory).
I want to give the option to log errors, if none are fatal, to a file. In order to do this, I need to be able to remove characters that are invalid in the Windows filesystem from the path and filename of the log.
Here is my problem:
I am so bad with regular expressions that I can't even check for word characters without using \w (\W?).
I've been searching Google and the forums for a while but I can't turn anything up.
EDIT: I almost forgot, perl compatible, if you could. (preg_)
Posted: Sun Jul 23, 2006 5:33 am
by daedalus__
omfsweetjesus
I got it!!!!!!!!!!!!!!!!!!!
Code: Select all
$string = 'this(is_an_invalid)file&name.lame';
echo $string.'<br />';
echo '<p>'.preg_replace('/[^a-zA-Z0-9._]/', '', $string).'</p>';
Outputs:
Code: Select all
<p>thisis_an_invalidfilename.lame</p>
!!!!!!!!!!!!!!!!!!!!!
"Replace any character that is not a through Z, a period, or an underscore."
I searched for almost 40 minutes before I tried to do it myself.
Posted: Sun Jul 23, 2006 5:43 am
by daedalus__
[^a-zA-Z0-9._\/] preserves the fowardwack in path names as well.
Posted: Sun Jul 23, 2006 7:27 am
by Chris Corbyn
Daedalus- wrote:[^a-zA-Z0-9._\/] preserves the fowardwack in path names as well.
Didn't know you could have foreard slashes but you forgot to escape the dot which is "any" character

I put dash in there too. So that condenses down to:
Since \w is the same as [a-zA-Z0-9_]

Posted: Sun Jul 23, 2006 8:55 am
by feyd
I hate to keep repeating this, but \w is not the same as a-zA-Z0-9_, it covers far more characters than just those.
Posted: Sun Jul 23, 2006 9:46 am
by Chris Corbyn
feyd wrote:I hate to keep repeating this, but \w is not the same as a-zA-Z0-9_, it covers far more characters than just those.
Does it cover UTF-8 characters too or something? Like accented letters? This is new to me

Posted: Sun Jul 23, 2006 9:49 am
by feyd
d11wtq wrote:Does it cover UTF-8 characters too or something? Like accented letters? This is new to me

I guess you missed my reply to another of your recommendations to use \w before:
viewtopic.php?p=245010#245010
Posted: Sun Jul 23, 2006 10:07 am
by Chris Corbyn
feyd wrote:d11wtq wrote:Does it cover UTF-8 characters too or something? Like accented letters? This is new to me

I guess you missed my reply to another of your recommendations to use \w before:
viewtopic.php?p=245010#245010
Yeah sorry I did. Interesting. Thanks

Posted: Sun Jul 23, 2006 11:51 am
by daedalus__
I thought that the period didn't have to be escaped since it is inside of the bracket deals.
It works fine?