Page 1 of 1
simple word filter for comment system
Posted: Mon Feb 02, 2009 6:50 pm
by Sindarin
I am trying to implement a simple word filter for common bad words in my comment system, but it seems I can't block variation of it when the user uses uppercase and lowercase letters.
Like I have 'badword1' listed but the user can input 'BADWORD1' or 'bAdwOrd1' and bypass it. How can I make the below code case insensitive?
/* REPLACE BAD WORDS WITH ASTERISKS */
Code: Select all
function word_filter($str)
{
$bad_words=array(
"badword1","badword2","badword3","badword4"
);
$replacements=array(
"**********"
);
for($i=0;$i < sizeof($bad_words);$i++){
srand((double)microtime()*1000000);
$rand_key = (rand()%sizeof($replacements));
$str=eregi_replace($bad_words[$i], $replacements[$rand_key], $str);
}
return $str;
}
Re: simple word filter for comment system
Posted: Mon Feb 02, 2009 11:38 pm
by watson516
How about converting all of the text to
lowercase before you check?
Re: simple word filter for comment system
Posted: Mon Feb 02, 2009 11:41 pm
by nor0101
You might also want to check out str_ireplace(). It's less computationally expensive than using a regex. See
http://us3.php.net/manual/en/function.str-ireplace.php for details.
Re: simple word filter for comment system
Posted: Tue Feb 03, 2009 3:40 am
by Sindarin
Good, this worked nicely. Thanks.
Code: Select all
for($i=0;$i < sizeof($bad_words);$i++){
$str=str_ireplace($bad_words,$replacements[0],$str);
}
return $str;
}
Re: simple word filter for comment system
Posted: Tue Feb 03, 2009 6:35 am
by Sindarin
I just noticed, it doesn't work correctly for non-English characters. Why is that?
It can detect e.g. "Κακή" but not "Κακη", "ΚΑΚΗ" or variations like "κΑΚη".
Re: simple word filter for comment system
Posted: Tue Feb 03, 2009 8:19 am
by mattpointblank
Because they're not the same characters..? They might have the same meaning linguistically, but their ascii/unicode/UTF-8 etc symbol will be different.
Re: simple word filter for comment system
Posted: Tue Feb 03, 2009 8:23 am
by mickeyunderscore
Just thought I'd add a note on for loops. Your for loop at the moment:
Code: Select all
for($i=0;$i < sizeof($bad_words);$i++){
Sets $i to 0, then at the beginning of each iteration it counts your array of bad words and compares $i to it, at the end of each iteration increments $i.
It would be better for you to use:
Code: Select all
for($i=0, $size = sizeof($bad_words);$i < $size;$i++){
This way it only counts the array once, rather than once per iteration.
Re: simple word filter for comment system
Posted: Thu Feb 05, 2009 3:20 am
by Sindarin
Because they're not the same characters..? They might have the same meaning linguistically, but their ascii/unicode/UTF-8 etc symbol will be different.
So there is no way to check for those combinations easily?
Re: simple word filter for comment system
Posted: Thu Feb 05, 2009 3:29 am
by mickeyunderscore
Sindarin wrote:So there is no way to check for those combinations easily?
Writing your own word filter would be very difficult, if you wanted it to be effective. I'd look into an open-source solution, there is bound to be some around.