Page 1 of 1

forum post validation regex

Posted: Tue Jul 11, 2006 2:56 am
by sa_Joshua
Hi,

I'm trying to filter out spam on my forum. I added two tables to my db. One is spamwords and another called spamAlphabet.

The spamAlphabet contains records like:

a aàáâåãäæ\@
b b6
c cç6
d db
e eéèêë3
f f4
g qgp9
h h4
i iìíîï¡1\|l\!
j ji1

which I use to construct a regex expression using a stored procedure in Sql.

Using the above substitution, the word "tour" becomes

/[t7\+][\\W]*[o0óòôøõö(\(\))\*\.][\\W]*[\\W]*[r]/gi


Essentially, I would like to find any form or shape of the word "tour", even if there is punctuation or white space between the letters.

Is the specific resulting regex that I'm using in the correct syntax to trap these sort of occurances? If not, please write how I should change it.

Thanks in Advance
Joshua

Posted: Wed Jul 12, 2006 12:04 am
by Benjamin
You could remove all invalid characters from the string and then validate against it. For example this function would remove all characters that are not letters or numbers. Not sure if this would work in your application.

Untested...

Code: Select all

function LettersAndNumbersOnly($String) {
  $WhiteList = array('0','1','2','3','4','5','6','7','8','9',
                     'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
                     'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z');
  $NewString = null;
  $LengthOfString = strlen($String);
  for($CharacterToCheck = 0; $CharacterToCheck < $LengthOfString; $CharacterToCheck++) {
    if(in_array($String[$CharacterToCheck],$WhiteList)){
      $NewString .= $String[$CharacterToCheck];
    }
  }
  return $NewString;
}

// regex against the word or phrase here..

reply

Posted: Wed Jul 12, 2006 12:19 am
by sa_Joshua
Thanks~!

I don't know php that well, but will adapt to javascript. Nice going!