Page 1 of 1

Preg_replace whole word only

Posted: Fri Mar 12, 2010 12:50 pm
by cesarcesar
Im trying to make a naughty word filter. It removes bad words fine, but instances where there is a bad word found in the text like "assist" and "asses" get caught in the filter as well. Strangely though if the sentence is: My asses to assist me." the clean version will read: My asses to ***ist me." It seems to clear the first use of the word in another word, but then blocks the rest. Any ideas? My script is below. Thanks.

Code: Select all

 
 
function cleanWords($value) {
 
    /*   strip naughty words   */
    $bad_word_file = 'standards/badwords.txt';
    $strtofile = fopen($bad_word_file, "r");
    $badwords = explode("\n", fread($strtofile, filesize($bad_word_file)));
    fclose($strtofile);
    
    for ($i = 0; $i < count($badwords); $i++) {
        $wordlist .= str_replace(chr(13),'',$badwords[$i]).'|';
    }
    $wordlist = substr($wordlist,0,-1);
 
    $value = preg_replace("/\b($wordlist)\b/ie", 'preg_replace("/./","*","\\1")', $value);  
    return $value;
 
}
 
 

Re: Preg_replace whole word only

Posted: Fri Mar 12, 2010 3:25 pm
by tr0gd0rr
The following works for me on the sentence "My asses to assist me":

Code: Select all

\b(asses|ass)\b
 
What does your final regex look like? Maybe you have "\r\n" as newlines?
 
also note that you can omit line 8 and lines 11 through 14 and instead use the following (faster) code:

Code: Select all

str_replace("\n","|",trim($badwords));
 
or if you have "\r\n" or "\r":

Code: Select all

str_replace(array("\n","\r"),"|",trim($badwords));

Re: Preg_replace whole word only

Posted: Sat Mar 13, 2010 5:40 pm
by cesarcesar
i ended up finding that the word "a.s.s." was in my list. I think the dots were messing up the expression. For thos interested, this is my new code. Thanks for any suggestions to get it where it is.

Code: Select all

 
$_SESSION[wordlist] = join("|", array_map('trim', file('standards/badwords.txt')));
 
function cleanWords($value) {
 
    global $_SESSION;
 
    $value = preg_replace("/\b($_SESSION[wordlist])\b/ie", 'str_repeat("*", strlen("\\1")) ', $value);  
    return $value;
 
}
 

Re: Preg_replace whole word only

Posted: Mon Mar 15, 2010 12:04 pm
by tr0gd0rr
This script looks like it has several problems. What about the following?

Code: Select all

$_SESSION["wordlist"] = str_replace("\n", "|", trim(file('standards/badwords.txt')));
 
function cleanWords($value) {
    $value = preg_replace("/\b({$_SESSION['wordlist']})\b/ie", 'str_repeat("*", strlen("$1")) ', $value); 
    return $value; 
}
And yes, you need to run preg_quote on each bad word.

If I were you, I would pre-process the badwords list with such a script so you could just include a file called `badwords.php` with one line that would look something like this:

Code: Select all

<?php
$badlistRegex = "/\b(asses|ass|a\.s\.s\.)\b/ie";
?>
It would be much faster than reading the file and processing it. And you probably don't want the same badword list stored in each user's session file!