bad word filter
Moderator: General Moderators
bad word filter
Hi,
I'm having trouble finding a function to check a variable against an array of bad words. The closest I can come is ereg, preg, etc. What would be the easist way to do this?
Thanks,
Eric
I'm having trouble finding a function to check a variable against an array of bad words. The closest I can come is ereg, preg, etc. What would be the easist way to do this?
Thanks,
Eric
From the PHP Manual:
That should give you the right idea.
Code: Select all
<?php
$phrase = "You should eat fruits, vegetables, and fiber every day.";
$healthy = array("fruits", "vegetables", "fiber");
$yummy = array("pizza", "beer", "ice cream");
$newphrase = str_replace($healthy, $yummy, $phrase);
?>Some while ago I played a bit with different functions to filter bad words. I'll just give you the code. There are many ways to filter strings and or arrays, with many functions. It also depends if you want to replace the bad words or just want to test if they exist. Anyway, here's the code. Probably can be improved.
[edit] Foobar is quicker, and gives a simpler and maybe better example
Code: Select all
<?php
// first variant
$input = array ('Badwords are in this language',
'more bad words in this sentence',
'and even more bad words');
$string = implode("\r\n", $input);
$badwords = array();
$badwords = array('bad','word', '<span style='color:blue' title='I'm naughty, are you naughty?'>smurf</span>');
function checkforbadwords($badwords,$string) {
foreach ($badwords as $key => $value)
{
if(stristr($string, $value) === FALSE)
{
return false;
}
else
{
return true;
}
}
}
if(checkforbadwords($badwords,$string)) { echo "Bad words found"; }
// second variant
function word_filter3 ($array,$badwords) {
$clean = array();
foreach ($array as $val)
{
$clean[] = str_replace($badwords,"***",strtolower($val));
}
return $clean;
}
$badwords = array();
$lines = file("bad_words.txt");
foreach ($lines as $line_num => $line) {
$badwords[] = trim($line);
}
$filteredinput = word_filter3($input,$badwords);
foreach ($filteredinput as $key => $val)
{
echo '<p>After Word_filter3: ' . $val .'</p>';
}
?>Yet, another:
Code: Select all
function isSafeWord($word){
include_once(INCLUDES_ROOT.'includes/forbiddenwords.inc.php');
$a_forbidden_wrd = getForbiddenWords();
foreach($a_forbidden_wrd as $ak => $val){
$ak = strtoupper($ak);
if(preg_match("/$ak/", strtoupper($word))){
return FALSE;
}
}
return TRUE;
}
//Forbidden words
function getForbiddenWords(){
$A_FORBIDDEN_WORDS =
array('IE'=>1,'MICROSOFT' => 1);
return $A_FORBIDDEN_WORDS;
}That's certainly a nice function. But -sorry to critisize - aren't some bad words missing?
Better safe then sorry 
Code: Select all
array('IE'=>1,'MICROSOFT' => 1, 'XP' => 1 , 'BILL' => 1, 'WINDOWS' => 1, 'EXPLORER' => 1);Do consider that bad word filters are (by design) attempting to limit the impossible to limit. Consider:
- S p a c e s b e t w e e n w o r d s w i l l d e f e a t i t.
- S0 w1ll using numb3rs 1nstead of 1etters.
- Don' even gets me staryed on missspelllins
- Lets remember that every fraqing word has a farg-ing good replacement that gets the point across perfectly, ice-hole
- For the more literate, I would say that any attempt at building such a system is full of bovine excrement. Perfectly legal, and twice as funny for the brains in the bunch.
- Comb1ning a n y of the above is fare game too, for those bastages that want to outdo you.
The list goes on, and on, and on. People that want to go around the filter, will.
So make sure that *before* you code a filter, you code a "block user" button for other users to be able to block them. Its a much better solution, and it ensures that even if the badguys get past your protection, the other users have a defense.
Darned Some of a batches, making everything difficult.
- S p a c e s b e t w e e n w o r d s w i l l d e f e a t i t.
- S0 w1ll using numb3rs 1nstead of 1etters.
- Don' even gets me staryed on missspelllins
- Lets remember that every fraqing word has a farg-ing good replacement that gets the point across perfectly, ice-hole
- For the more literate, I would say that any attempt at building such a system is full of bovine excrement. Perfectly legal, and twice as funny for the brains in the bunch.
- Comb1ning a n y of the above is fare game too, for those bastages that want to outdo you.
The list goes on, and on, and on. People that want to go around the filter, will.
So make sure that *before* you code a filter, you code a "block user" button for other users to be able to block them. Its a much better solution, and it ensures that even if the badguys get past your protection, the other users have a defense.
Darned Some of a batches, making everything difficult.
-
stuffradio
- Forum Newbie
- Posts: 14
- Joined: Tue Jan 03, 2006 2:33 am
You could do that... but why don't you do both? As in, if they are doing it to an excessive extent than you can ban them instead of right away?Roja wrote:Do consider that bad word filters are (by design) attempting to limit the impossible to limit. Consider:
- S p a c e s b e t w e e n w o r d s w i l l d e f e a t i t.
- S0 w1ll using numb3rs 1nstead of 1etters.
- Don' even gets me staryed on missspelllins
- Lets remember that every fraqing word has a farg-ing good replacement that gets the point across perfectly, ice-hole
- For the more literate, I would say that any attempt at building such a system is full of bovine excrement. Perfectly legal, and twice as funny for the brains in the bunch.
- Comb1ning a n y of the above is fare game too, for those bastages that want to outdo you.
The list goes on, and on, and on. People that want to go around the filter, will.
So make sure that *before* you code a filter, you code a "block user" button for other users to be able to block them. Its a much better solution, and it ensures that even if the badguys get past your protection, the other users have a defense.
Darned Some of a batches, making everything difficult.
Roja is right, getting 100% filtered is an illusion. I also agree with stuffradio, that a combination of measures is the best. I think it all depends on the situation (duh). Sometimes using a badword filter with 5 words can filter out 90% of automated comment spam on - for example - an online guestbook. But the tactics of these spambots change, and stopping a live person from writing bad (misspelled) stuff will be hard to filter. In the end some more intelligent system, driven by a collective human effort will be a better solution.