Page 1 of 1

PHP Smart Quotes & Encoding

Posted: Wed Mar 16, 2005 11:40 am
by bjh5537
I've been having some recent troubles with PHP and encoding when sending e-mails. We have a form where our customers can enter a message and send it to a specific person. It's purposes is such that it makes sense to copy content from websites, such as msnbc. MSNBC in particular uses "smart quotes" which are angeled and are not the same for the beginning and end. These characters and others are included in the text box. When the send button is pressed and the message comes across, the smart quotes end up like this:
“ and â€

Posted: Wed Mar 16, 2005 3:56 pm
by Ambush Commander
Ah... this is a big problem because of stupid Microsoft Word. Stupid Smart Quotes. I use this code to clean my output:

Code: Select all

function speckit($string) {
   
   $trans = get_html_translation_table(HTML_ENTITIES, ENT_COMPAT);

   foreach ($trans as $key => $value) {
      $trans[$key] = '&#'.ord($key).';';
   }

   $trap = array_flip($trans);
   $texcep1 = $trap[' '];
   $texcep2 = $trap['­'];
   
   $trans['–'] = "--";
   $trans['"'] = """;
   $trans["’"] = "'";
   $trans["‘"] = "'";
   $trans['“'] = """;
   $trans['”'] = """;
   $trans['…'] = "...";
   
   return strtr($string, $trans);
}
It gets an HTML_TRANSLATION table and also adds a few extra translations (that I found useful). Hope it helps.

Posted: Mon Mar 21, 2005 8:03 am
by bjh5537
Unfortunately this only made things worse. It seems like some kind of conversion is happening in the background with PHP. I submit a form with the open smart quote, and I get back this after running it through your function:

�

It's as if PHP is converting that smart quote character into three characters or something and your script is picking out the first one.

It absolutely boggles my mind that seemingly no one in the PHP community has run into this issue. In our specific case, it is old macintosh computers using our forms to submit data (not even copied from word or anything) and it seems as though these characters are coming up. I have no clue how to even begin diagnosing this.

Posted: Mon Mar 21, 2005 8:07 am
by feyd
it's often from odd content-encoding changes between locales. PHP doesn't, by default, process things in UTF-8.. you need to use the mbstring stuff for that I believe.