MySQL not handling UTF-8 encoding correctly?

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
nutkenz
Forum Contributor
Posts: 155
Joined: Tue Jul 19, 2005 12:25 pm

MySQL not handling UTF-8 encoding correctly?

Post by nutkenz »

In my common.php file - used in all pages - I've defined the following;

Code: Select all

mb_language('uni');
mb_http_output("UTF-8");
mb_internal_encoding("UTF-8");
header("Cache-control: private");
header("Content-type: text/html; charset=utf-8");
This is what I've inserted into my DB: "Beschermt DVD’s, CD’s, foto’s, CD-roms en...".

This seems to be okay; I can get the value back from the DB and display it the same way in a browser (set to prefer UTF-8).

However, in phpMyAdmin I see the description as: "Beschermt DVD’s, CD’s, foto’s, CD-roms en..."

Collation is set to utf8_unicode_ci on the connection (global config), column, table and database.
The character encoding is set to UTF-8 Unicode (utf8).

I'm running phpMyAdmin v2.11.1 and MySQL 5.0.45 on a Windows system

In config.inc.php for phpMyAdmin I see the following is defined:

Code: Select all

// Default language to use, if not browser-defined or user-defined
$cfg['DefaultLang'] = 'en-iso-8859-1';
 
// Default connection collation (used for MySQL >= 4.1)
$cfg['DefaultConnectionCollation'] = 'utf8_unicode_ci';
 
// Default charset to use for recoding of MySQL queries, does not take
// any effect when charsets recoding is switched off by
// $cfg['AllowAnywhereRecoding'] or in language file
// (see $cfg['AvailableCharsets'] to possible choices, you can add your own)
$cfg['DefaultCharset'] = 'utf-8';
 
// Available charsets for MySQL conversion. currently contains all which could
// be found in lang/* files and few more.
// Charsets will be shown in same order as here listed, so if you frequently
// use some of these move them to the top.
$cfg['AvailableCharsets'] = array(
    'utf-16',
    'utf-8',
);
My my.cnf file for MySQL contains:

Code: Select all

character-set-server = utf8
collation-server = utf8_unicode_ci
Does anyone know what's going on here? Why is phpMyAdmin screwing up the characters?
nutkenz
Forum Contributor
Posts: 155
Joined: Tue Jul 19, 2005 12:25 pm

Re: MySQL not handling UTF-8 encoding correctly?

Post by nutkenz »

Does anyone have a clue?
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: MySQL not handling UTF-8 encoding correctly?

Post by dml »

The first thing to confirm is whether the data is encoded correctly in the database, which will isolate the problem to either the side of the system that's inserting data, or the side that's selecting data.

A quick check that can find a subset of utf8 encoding problems is to call char_length(x), where x is the column the data is stored in. If the character length returned is longer than the number of characters in the string, there's definitely a problem.

More thoroughly... it looks like your problem character is the 'RIGHT SINGLE QUOTATION MARK', which is 0x92 in latin1, Unicode code point u2019, utf8-encoded as 0xE28099. If you do 'SELECT charset(x), hex(x) from table', the hex representation should be consistent with the encoding, so:

Code: Select all

 
charset: latin1, hex: ...92... // CORRECT
charset: utf8, hex: ...E28099... // CORRECT
charset: latin1, hex: ...E28099.... // INCORRECT
 
nutkenz
Forum Contributor
Posts: 155
Joined: Tue Jul 19, 2005 12:25 pm

Re: MySQL not handling UTF-8 encoding correctly?

Post by nutkenz »

I inserted é into the database and ran the following query using phpmyadmin:

Code: Select all

 
SELECT domain, char_length(domain) 
FROM advertisers 
WHERE id = 4
 
Resulting in:

char_length(domain): é
domain: 2

It seems like the data is not being inserted as UTF-8? Though I don't know why this is happening... Are there any other places to indicate I'm using UTF-8?
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: MySQL not handling UTF-8 encoding correctly?

Post by dml »

Before you do the insertion, run 'set names utf8'. This will tell the database to interpret any strings sent in as utf8.
nutkenz
Forum Contributor
Posts: 155
Joined: Tue Jul 19, 2005 12:25 pm

Re: MySQL not handling UTF-8 encoding correctly?

Post by nutkenz »

dml wrote:Before you do the insertion, run 'set names utf8'. This will tell the database to interpret any strings sent in as utf8.
That seems to have helped... but will I have to do this before each and every query? Can't MySQL do this automatically?
dml
Forum Contributor
Posts: 133
Joined: Sat Jan 26, 2008 2:20 pm

Re: MySQL not handling UTF-8 encoding correctly?

Post by dml »

Not before every query, but on connection setup, so after mysql_connect and mysql_select_db, it's one extra line to configure the character sets.

If the application is only going to be used in environments where you control the database server, another way of doing it would be to set the defaults for mysqld by adding entries for character_set_results, character_set_connection, and character_set_client params in my.cnf and restarting mysqld.
Post Reply