Page 1 of 1

Handle malformed latin1 encoding

Posted: Wed Sep 01, 2010 11:25 am
by pickle
Hi all,

In short:

I've got a latin1 encoded MySQL table with utf8 characters in it. How can I properly escape the utf8 characters in PHP?

In detail:

I've got a database table with the latin1 character set. I can't convert it to utf8 because that'll seriously cut down the allowed size of the row - and I need it as large as possible. I've imported french characters, which in the latin1 encoding get stored as a ?. When I take that value out of the DB & try to display it with PHP, htmlspecialcharacters() craps itself & breaks output. Is there a way I can either:
  1. Properly escape the string when reading from the tables, or
  2. Properly convert utf8 characters (such as é -> e) when being imported
Thanks.

Re: Handle malformed latin1 encoding

Posted: Wed Sep 01, 2010 12:05 pm
by Weirdan
So they are utf-8 characters stored in latin1-encoded table like it was just binary? If so, then you can set connection character set to latin1 (SET NAMES latin1) and get your utf-8 from database unaffected I believe. After that, if you need to embed those texts in latin1 encoded page, you'll have to use utf8_decode(). Forget about proper sorting with 'ORDER BY' though.

Re: Handle malformed latin1 encoding

Posted: Wed Sep 01, 2010 12:16 pm
by pickle
They are french characters stored in a latin1 encoded table, over a latin1 connection. Everything about the database is latin1. The webpage is utf8 encoded however. I'll try that utf8_decode().

Re: Handle malformed latin1 encoding

Posted: Wed Sep 01, 2010 12:29 pm
by pickle
It seems that htmlentities() does what I need.

utf8_decode() didn't work as well as I'd hoped. Rather than Firefox displaying a diamond with a question mark in it (kind of a WTF symbol), it displayed the ? as it appears in the DB, but cut off the string after the ?.

Re: Handle malformed latin1 encoding

Posted: Wed Sep 01, 2010 3:17 pm
by Weirdan
well, if you have real latin1 out of your database and want to put it into a utf8 page you should have used utf8_encode(). I thought you had utf8 stored into the database over the latin1 connection - this usually appears to be working fine except you can't really use string manipulation functions in sql and can't properly order by such fields.

Re: Handle malformed latin1 encoding

Posted: Wed Sep 01, 2010 3:21 pm
by pickle
For the purposes of this application, we'll never be doing string manipulation, and any sorting errors will be fairly inconsequential. Nonetheless, thanks for mentioning the setbacks.