Page 1 of 1

bad word filter

Posted: Sat Jan 21, 2006 10:26 am
by ferric
Hi,

I'm having trouble finding a function to check a variable against an array of bad words. The closest I can come is ereg, preg, etc. What would be the easist way to do this?

Thanks,
Eric

Posted: Sat Jan 21, 2006 10:59 am
by foobar
From the PHP Manual:

Code: Select all

<?php

$phrase  = "You should eat fruits, vegetables, and fiber every day.";
$healthy = array("fruits", "vegetables", "fiber");
$yummy  = array("pizza", "beer", "ice cream");

$newphrase = str_replace($healthy, $yummy, $phrase);

?>
That should give you the right idea.

Posted: Sat Jan 21, 2006 11:16 am
by matthijs
Some while ago I played a bit with different functions to filter bad words. I'll just give you the code. There are many ways to filter strings and or arrays, with many functions. It also depends if you want to replace the bad words or just want to test if they exist. Anyway, here's the code. Probably can be improved.

Code: Select all

<?php
// first variant
$input = array ('Badwords are in this language',
                'more bad words in this sentence',
                'and even more bad words');
$string = implode("\r\n", $input);

$badwords = array();
$badwords = array('bad','word', '<span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>');

function checkforbadwords($badwords,$string) {

  foreach ($badwords as $key => $value)
  {
    if(stristr($string, $value) === FALSE) 
    {
		 return false;
    }
    else
    {
		 return true;
    }
  }
}
if(checkforbadwords($badwords,$string)) { echo "Bad words found"; }

// second variant

function word_filter3 ($array,$badwords) {   
		
    $clean = array();
		
    foreach ($array as $val) 
    {
        $clean[] = str_replace($badwords,"***",strtolower($val));

    } 
    return $clean;
}
$badwords = array();
$lines = file("bad_words.txt");
foreach ($lines as $line_num => $line) {
    $badwords[] = trim($line);	
}
$filteredinput = word_filter3($input,$badwords);
foreach ($filteredinput as $key => $val) 
		{
		echo '<p>After Word_filter3: ' . $val .'</p>';
		}
?>
[edit] Foobar is quicker, and gives a simpler and maybe better example

Posted: Sat Jan 21, 2006 1:50 pm
by hawleyjr
Yet, another:

Code: Select all

function isSafeWord($word){
	
	include_once(INCLUDES_ROOT.'includes/forbiddenwords.inc.php'); 
	
	$a_forbidden_wrd = getForbiddenWords();

	foreach($a_forbidden_wrd as $ak => $val){
		$ak = strtoupper($ak);
		if(preg_match("/$ak/", strtoupper($word))){			
			return FALSE;
		}
	}
	
	return TRUE;
}


//Forbidden words

function getForbiddenWords(){
	$A_FORBIDDEN_WORDS = 
                array('IE'=>1,'MICROSOFT' => 1);

	return $A_FORBIDDEN_WORDS;

}

Posted: Sat Jan 21, 2006 4:09 pm
by matthijs
That's certainly a nice function. But -sorry to critisize - aren't some bad words missing?

Code: Select all

array('IE'=>1,'MICROSOFT' => 1, 'XP' => 1 , 'BILL' => 1, 'WINDOWS' => 1, 'EXPLORER' => 1);
Better safe then sorry :D

Posted: Sat Jan 21, 2006 8:20 pm
by Roja
Do consider that bad word filters are (by design) attempting to limit the impossible to limit. Consider:

- S p a c e s b e t w e e n w o r d s w i l l d e f e a t i t.
- S0 w1ll using numb3rs 1nstead of 1etters.
- Don' even gets me staryed on missspelllins
- Lets remember that every fraqing word has a farg-ing good replacement that gets the point across perfectly, ice-hole
- For the more literate, I would say that any attempt at building such a system is full of bovine excrement. Perfectly legal, and twice as funny for the brains in the bunch.
- Comb1ning a n y of the above is fare game too, for those bastages that want to outdo you.

The list goes on, and on, and on. People that want to go around the filter, will.

So make sure that *before* you code a filter, you code a "block user" button for other users to be able to block them. Its a much better solution, and it ensures that even if the badguys get past your protection, the other users have a defense.

Darned Some of a batches, making everything difficult.

Posted: Sun Jan 22, 2006 2:03 am
by stuffradio
Roja wrote:Do consider that bad word filters are (by design) attempting to limit the impossible to limit. Consider:

- S p a c e s b e t w e e n w o r d s w i l l d e f e a t i t.
- S0 w1ll using numb3rs 1nstead of 1etters.
- Don' even gets me staryed on missspelllins
- Lets remember that every fraqing word has a farg-ing good replacement that gets the point across perfectly, ice-hole
- For the more literate, I would say that any attempt at building such a system is full of bovine excrement. Perfectly legal, and twice as funny for the brains in the bunch.
- Comb1ning a n y of the above is fare game too, for those bastages that want to outdo you.

The list goes on, and on, and on. People that want to go around the filter, will.

So make sure that *before* you code a filter, you code a "block user" button for other users to be able to block them. Its a much better solution, and it ensures that even if the badguys get past your protection, the other users have a defense.

Darned Some of a batches, making everything difficult.
You could do that... but why don't you do both? As in, if they are doing it to an excessive extent than you can ban them instead of right away?

Posted: Sun Jan 22, 2006 2:38 am
by matthijs
Roja is right, getting 100% filtered is an illusion. I also agree with stuffradio, that a combination of measures is the best. I think it all depends on the situation (duh). Sometimes using a badword filter with 5 words can filter out 90% of automated comment spam on - for example - an online guestbook. But the tactics of these spambots change, and stopping a live person from writing bad (misspelled) stuff will be hard to filter. In the end some more intelligent system, driven by a collective human effort will be a better solution.

Posted: Sun Jan 22, 2006 7:16 am
by Roja
stuffradio wrote:You could do that... but why don't you do both?
Roja wrote:So make sure that *before* you code a filter, you code a "block user" button for other users to be able to block them.
The word "Before" in my suggestion implies that both will occur.