Handle malformed latin1 encoding

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Handle malformed latin1 encoding

Post by pickle »

Hi all,

In short:

I've got a latin1 encoded MySQL table with utf8 characters in it. How can I properly escape the utf8 characters in PHP?

In detail:

I've got a database table with the latin1 character set. I can't convert it to utf8 because that'll seriously cut down the allowed size of the row - and I need it as large as possible. I've imported french characters, which in the latin1 encoding get stored as a ?. When I take that value out of the DB & try to display it with PHP, htmlspecialcharacters() craps itself & breaks output. Is there a way I can either:
  1. Properly escape the string when reading from the tables, or
  2. Properly convert utf8 characters (such as é -> e) when being imported
Thanks.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Handle malformed latin1 encoding

Post by Weirdan »

So they are utf-8 characters stored in latin1-encoded table like it was just binary? If so, then you can set connection character set to latin1 (SET NAMES latin1) and get your utf-8 from database unaffected I believe. After that, if you need to embed those texts in latin1 encoded page, you'll have to use utf8_decode(). Forget about proper sorting with 'ORDER BY' though.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Handle malformed latin1 encoding

Post by pickle »

They are french characters stored in a latin1 encoded table, over a latin1 connection. Everything about the database is latin1. The webpage is utf8 encoded however. I'll try that utf8_decode().
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Handle malformed latin1 encoding

Post by pickle »

It seems that htmlentities() does what I need.

utf8_decode() didn't work as well as I'd hoped. Rather than Firefox displaying a diamond with a question mark in it (kind of a WTF symbol), it displayed the ? as it appears in the DB, but cut off the string after the ?.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Handle malformed latin1 encoding

Post by Weirdan »

well, if you have real latin1 out of your database and want to put it into a utf8 page you should have used utf8_encode(). I thought you had utf8 stored into the database over the latin1 connection - this usually appears to be working fine except you can't really use string manipulation functions in sql and can't properly order by such fields.
User avatar
pickle
Briney Mod
Posts: 6445
Joined: Mon Jan 19, 2004 6:11 pm
Location: 53.01N x 112.48W
Contact:

Re: Handle malformed latin1 encoding

Post by pickle »

For the purposes of this application, we'll never be doing string manipulation, and any sorting errors will be fairly inconsequential. Nonetheless, thanks for mentioning the setbacks.
Real programmers don't comment their code. If it was hard to write, it should be hard to understand.
Post Reply