using regular expression to limit vugarity in forums

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
MikeK
Forum Newbie
Posts: 5
Joined: Wed Jul 23, 2003 12:43 am

using regular expression to limit vugarity in forums

Post by MikeK »

I'm trying to put a "filter" on my guestbook to keep people from using vulgar language currently I use something like

Code: Select all

else if (eregi(" ass.",$comments)) &#123;printf("<font color=#ffffff>Post unacceptable."); $why="Used word beginning with ass";&#125;
however this is causing a problem with its generality ... My site is a race track site so I commonly see postings about problems with "assemblies" or someone needs to "assure" another that the "association " rules say so...

why I placed this RFH on the advanced board is I can use an array of blacklisted ass. words (and or white lists) to disallow posts but how do I identify if a posting has both an allowable and a blacklisted word???

When using the code above I blacklist every ass word. I want posts like "I assure you the assembly meets association rules" to go through

BUT and here is the kicker (for me at least) depending on my if structure posts like "take a look at the assembly <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>" will get through because I will have a whitelist match on assembly and thus jump out of my else if ...


Do you begin to see the complexity? Has anyone else tackled this kind of filtering of their forums/guestbooks???
templatesforall
Forum Newbie
Posts: 5
Joined: Thu Jul 24, 2003 2:39 am
Location: Tucson, Arizona
Contact:

Post by templatesforall »

Actually I have the same problem with my guestbook script. It censors words with ass in it. :P glass, grass, class.. You get the picture. I tried putting a space on each side of the words and it stopped working.
MikeK
Forum Newbie
Posts: 5
Joined: Wed Jul 23, 2003 12:43 am

Post by MikeK »

We had that one too ... there are so many that end in ass that we just killed the .ass in our checks. Since the responses have been slow in coming I'm toying with the idea of a loop and running through a whitelist. My problem still remains though of "association <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>" getting through while "<span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span> association" would get blocked. Now that i type this out the only thing I can think of is exploding the comments running it through a loop checking each and every word creating a dynamic array to hold the "yay" or "nays" chcking that array at the end to see if there were any cuss words (so for each word in the comment it gets run through the white AND black lists. sounds like an ineffecient answer
qartis
Forum Contributor
Posts: 271
Joined: Sat Dec 14, 2002 4:43 pm
Location: BC, Canada
Contact:

Post by qartis »

Not sure if this is practical enough, but..

Code: Select all

<?
function cuss_check($message){
	$badwords[] = "ass";
	$badwords[] = "loser";
	$badwords[] = "moron";
	$badwords[] = "penis";

	$punctuation = array(".",",","'",""","!","@");

	$message = str_replace($punctuation,"",$message);

	$words_array = explode (" ",$message);

	foreach ($words_array as $word){
		if (in_array(trim($word),$badwords)){
			return false;
		}
	}
	
	return true;
}


$message = "I'd like to call you an associate, if I may...";

if (cuss_check($message)){
	echo "Clean as a whistle";
} else {
	echo "omigod you're going to hell.";
}
?>
MikeK
Forum Newbie
Posts: 5
Joined: Wed Jul 23, 2003 12:43 am

Post by MikeK »

That looks like what I want except I also want to filter for any permutations of ass words like assmunch assbiter assface ... so the

Code: Select all

eregi(' ass.',$message)
works except I need to whitelist words like association, assure, assemblies etc.

got any thoughts on how to incorporate that requirement?
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

On possibility:

1. Exact matches to your "template word" are censored
2. prefiix/suffix/infix matches are queried against a dictionairy file, if a match found, let the word through, if not censor.

This would require placing all "real" word/naughty words in the template or removing them from the dictionairy file. But should still stop most of the non-real word combos without explicity listings.
templatesforall
Forum Newbie
Posts: 5
Joined: Thu Jul 24, 2003 2:39 am
Location: Tucson, Arizona
Contact:

Post by templatesforall »

I was thinking. Why not have PHP count the letters in the word to check and the word to be checked. If the number is the same censor it, if not leave it alone.
MikeK
Forum Newbie
Posts: 5
Joined: Wed Jul 23, 2003 12:43 am

huh??

Post by MikeK »

Book, <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>

Same # of char

assures, <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>

Same # of char
qartis
Forum Contributor
Posts: 271
Joined: Sat Dec 14, 2002 4:43 pm
Location: BC, Canada
Contact:

Post by qartis »

If you're worried about specifics, like "assmonger, assmonkey, just make an array of all the ass* words you WILL allow, I'm sure there aren't that many:

associate
associating
associated
association
assimilate
assimilation
assimilated
assassin
assassination
assassinate
assault
assaulted
assaulting
assaulter
assiniboine

And then incorporate that into your filter.
MikeK
Forum Newbie
Posts: 5
Joined: Wed Jul 23, 2003 12:43 am

Post by MikeK »

Yes, that goes with the Blacklist Whitelist approach, the thing I hate about this approach is exploding the $comments into an array and checking each and every word to see if there is any black listed words ... I'm really suprised no one else has created the solution. We can't be the only site trying to limit vulgarity (I notice this forum automatically censored a word for me <span style='color:blue' title='I&#39;m naughty, are you naughty?'>smurf</span>!!)
Post Reply