Page 1 of 1

regex for filtering non-standard charachters

Posted: Thu Apr 16, 2009 9:00 am
by jazz090
i got this <textarea> that is meant to accept around 350 words, now the audeince of this website will be pasting it from word more than 99% of the time which creates a problem for me becuase of the whole rendering system used by word that converts quotes to “”. but thats only small part of the problem because im filtering them out:

Code: Select all

$data = str_replace("‘", "'", $data);
$data = str_replace("’", "'", $data);
$data = str_replace("“", '"', $data);
$data = str_replace("”", '"', $data);
$data = str_replace("–", "-", $data);
but what about the rest, you got all these symbols in word such as alpha-beta, neq etc... as they just mess up in mysql and even one occasion when i combines symbols and (“”) in the textarea and processed it, the above code was completly ignored and the word engulfed by (“”) was completly denatured. i need some sort of regex to get rid of these unwanted characters, any regex or wisdom is appreciated.. safe

Re: regex for filtering non-standard charachters

Posted: Thu Apr 16, 2009 9:21 am
by prometheuzz
Regex is not the way you'd want to go. Have a look at http://nl3.php.net/mysql_real_escape_string instead.

Re: regex for filtering non-standard charachters

Posted: Thu Apr 16, 2009 9:26 am
by jazz090
i know about this function, and when i say it will mess up the mysql, i mean after being escaped by the afformentioned mysql_real_escape_string().

Re: regex for filtering non-standard charachters

Posted: Thu Apr 16, 2009 9:30 am
by prometheuzz
jazz090 wrote:i know about this function, and when i say it will mess up the mysql, i mean after being escaped by the afformentioned mysql_real_escape_string().
Well, then your original post lacked quite some information.

Best of luck of course.

Re: regex for filtering non-standard charachters

Posted: Thu Apr 16, 2009 10:02 am
by jazz090
a better question would be, how do u use regex to undentify non-ascii chacrters?

Re: regex for filtering non-standard charachters

Posted: Wed Apr 22, 2009 11:58 am
by GeertDD
jazz090 wrote:a better question would be, how do u use regex to undentify non-ascii chacrters?
Kohana contains three ascii related functions:
  • is_ascii()
  • strip_ascii_ctrl()
  • strip_non_ascii()
View the code here: http://dev.kohanaphp.com/browser/tags/2 ... 8.php#L135