I had exactly the same problem recently. I wanted to prevent users from entering curly quotes, emdashes, elipses anywhere in my solution. There's more on this problem here
http://www.cs.tut.fi/~jkorpela/www/windows-chars.html
Code: Select all
// REPLACE ANY STUPID UTF8 CHARS
function utf8_char_replace(&$var) {
$trans_table = array(
chr(0xe2).chr(0x80).chr(0x9a) => '\'', //SINGLE LOW-9 QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x9e) => '"', //DOUBLE LOW-9 QUOTATION MARK
chr(0xe2).chr(0x80).chr(0xa6) => '...', //HORIZONTAL ELLIPSIS
chr(0xe2).chr(0x80).chr(0x98) => '\'', //LEFT SINGLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x99) => '\'', //RIGHT SINGLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x9c) => '"', //LEFT DOUBLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x9d) => '"', //RIGHT DOUBLE QUOTATION MARK
chr(0xe2).chr(0x80).chr(0x93) => '-', //EN DASH
chr(0xe2).chr(0x80).chr(0x94) => '-' //EM DASH
);
foreach ($trans_table as $utf8_code => $replace) {
$var = str_replace($utf8_code, $replace, $var);
}
return $var;
}
array_walk_recursive($_GET, 'utf8_char_replace');
array_walk_recursive($_POST, 'utf8_char_replace');
array_walk_recursive($_COOKIE, 'utf8_char_replace');
Add this in the top of your database "save" script, before you get the POST/GET data into variables and save into MySQL. It basically goes through $_GET, $_POST, $_COOKIE and replaces any annoying characters with their regular equivalents. So an en/em dash gets replaced with a regular dash. And curly quotes get replaced with their regular equivalents.
It took me a while to figure this out, but when the user pastes from word into the textarea in the browser, it's in the stupid windows iso-8859-1 charset.
But when the user presses submit, it was being converted (by the browser I guess) into UTF-8. So this function actually replaces the UTF-8 codes for curly quotes, with their simpler equivalents.
For a couple of days I was trying to replace the iso-8859-1 chars, and couldn't figure out why they weren't being replaced.
I used these tables to get the UTF8 values that I needed to replace:
http://www.manderby.com/mandalex/a/ascii.php so you could add further values to the $trans_table array.
This function sits in my site-wide config file and gets run on every page, just after my fixing magic quotes global function. I use UTF-8 throughout my scripts and database.
Hope this helps you out
Cheers, B