Page 1 of 1

MySQL not handling UTF-8 encoding correctly?

Posted: Thu May 01, 2008 12:42 pm
by nutkenz
In my common.php file - used in all pages - I've defined the following;

Code: Select all

mb_language('uni');
mb_http_output("UTF-8");
mb_internal_encoding("UTF-8");
header("Cache-control: private");
header("Content-type: text/html; charset=utf-8");
This is what I've inserted into my DB: "Beschermt DVD’s, CD’s, foto’s, CD-roms en...".

This seems to be okay; I can get the value back from the DB and display it the same way in a browser (set to prefer UTF-8).

However, in phpMyAdmin I see the description as: "Beschermt DVD’s, CD’s, foto’s, CD-roms en..."

Collation is set to utf8_unicode_ci on the connection (global config), column, table and database.
The character encoding is set to UTF-8 Unicode (utf8).

I'm running phpMyAdmin v2.11.1 and MySQL 5.0.45 on a Windows system

In config.inc.php for phpMyAdmin I see the following is defined:

Code: Select all

// Default language to use, if not browser-defined or user-defined
$cfg['DefaultLang'] = 'en-iso-8859-1';
 
// Default connection collation (used for MySQL >= 4.1)
$cfg['DefaultConnectionCollation'] = 'utf8_unicode_ci';
 
// Default charset to use for recoding of MySQL queries, does not take
// any effect when charsets recoding is switched off by
// $cfg['AllowAnywhereRecoding'] or in language file
// (see $cfg['AvailableCharsets'] to possible choices, you can add your own)
$cfg['DefaultCharset'] = 'utf-8';
 
// Available charsets for MySQL conversion. currently contains all which could
// be found in lang/* files and few more.
// Charsets will be shown in same order as here listed, so if you frequently
// use some of these move them to the top.
$cfg['AvailableCharsets'] = array(
    'utf-16',
    'utf-8',
);
My my.cnf file for MySQL contains:

Code: Select all

character-set-server = utf8
collation-server = utf8_unicode_ci
Does anyone know what's going on here? Why is phpMyAdmin screwing up the characters?

Re: MySQL not handling UTF-8 encoding correctly?

Posted: Sun May 04, 2008 1:10 pm
by nutkenz
Does anyone have a clue?

Re: MySQL not handling UTF-8 encoding correctly?

Posted: Mon May 05, 2008 8:49 am
by dml
The first thing to confirm is whether the data is encoded correctly in the database, which will isolate the problem to either the side of the system that's inserting data, or the side that's selecting data.

A quick check that can find a subset of utf8 encoding problems is to call char_length(x), where x is the column the data is stored in. If the character length returned is longer than the number of characters in the string, there's definitely a problem.

More thoroughly... it looks like your problem character is the 'RIGHT SINGLE QUOTATION MARK', which is 0x92 in latin1, Unicode code point u2019, utf8-encoded as 0xE28099. If you do 'SELECT charset(x), hex(x) from table', the hex representation should be consistent with the encoding, so:

Code: Select all

 
charset: latin1, hex: ...92... // CORRECT
charset: utf8, hex: ...E28099... // CORRECT
charset: latin1, hex: ...E28099.... // INCORRECT
 

Re: MySQL not handling UTF-8 encoding correctly?

Posted: Mon May 05, 2008 11:38 am
by nutkenz
I inserted é into the database and ran the following query using phpmyadmin:

Code: Select all

 
SELECT domain, char_length(domain) 
FROM advertisers 
WHERE id = 4
 
Resulting in:

char_length(domain): é
domain: 2

It seems like the data is not being inserted as UTF-8? Though I don't know why this is happening... Are there any other places to indicate I'm using UTF-8?

Re: MySQL not handling UTF-8 encoding correctly?

Posted: Mon May 05, 2008 12:30 pm
by dml
Before you do the insertion, run 'set names utf8'. This will tell the database to interpret any strings sent in as utf8.

Re: MySQL not handling UTF-8 encoding correctly?

Posted: Mon May 05, 2008 12:40 pm
by nutkenz
dml wrote:Before you do the insertion, run 'set names utf8'. This will tell the database to interpret any strings sent in as utf8.
That seems to have helped... but will I have to do this before each and every query? Can't MySQL do this automatically?

Re: MySQL not handling UTF-8 encoding correctly?

Posted: Tue May 06, 2008 3:59 am
by dml
Not before every query, but on connection setup, so after mysql_connect and mysql_select_db, it's one extra line to configure the character sets.

If the application is only going to be used in environments where you control the database server, another way of doing it would be to set the defaults for mysqld by adding entries for character_set_results, character_set_connection, and character_set_client params in my.cnf and restarting mysqld.