So I thought I had the conversion of a db to UTF-8 figured out. But I don't or at least I'm confused on how to use it. I've converted the db over to UTF-8 both in collation and charset.
When I try to enter a UTF-8 type character say c2 b4 (or ´ the ACUTE ACCENT) it works fine in phpmyadmin. When I do a mysqldump and hexdump the data i see it in the insert string as c2 b4.
However when I try to insert it via php (5.3) with something like this:
[text]UPDATE `database`.`table` SET `content` = 'You don\xc2\xB4t know UTF-8?' WHERE `table`.`id` =1;[/text]
Which when I save this string to a file system (utf-8) and use hexdump it looks right with c2 b4. But looking in the database I get [text]You don´t know UTF-8?[/text]
Where ´ is c3 82 42 34 and hexdump of the mysqldump file shows it as "You don..B4t know" where the B4 seems to hint at the original c2 b4 two bytes.
Any ideas?
If i just use B4 it works, but that isn't UTF-8 is it?
[text]UPDATE `database`.`table` SET `content` = 'You don\xB4t know UTF-8?' WHERE `table`.`id` =1;[/text]And when I hexdump this it shows up as c2 b4 and looks correct.
UTF-8 confusion
Moderator: General Moderators
Re: UTF-8 confusion
what's your connection character set?
Re: UTF-8 confusion
Good point, thanks. After all the converting I didn't look to see what the database thought of it all. A few variables are still latin1
[text]Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8[/text]
Looks like I have to do some more changes to get everything into UTF8. I'm sure the database is set to UTF8 per database that is uploaded, so this is a little weird to see and I'm not sure why the server is set to latin1.
[text]Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8[/text]
Looks like I have to do some more changes to get everything into UTF8. I'm sure the database is set to UTF8 per database that is uploaded, so this is a little weird to see and I'm not sure why the server is set to latin1.
Re: UTF-8 confusion
Well, I don't get it still.
[text]-- phpMyAdmin SQL Dump
-- version 3.2.5
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Nov 08, 2011 at 02:04 PM
-- Server version: 5.1.54
-- PHP Version: 5.3.5-1ubuntu7.3
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
--
-- Database: `database`
--
-- --------------------------------------------------------
--
-- Table structure for table `table`
--
CREATE TABLE IF NOT EXISTS `table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` longtext,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;[/text]
I did go in and "ALTER DATABASE database CHARACTER SET utf8; Then tried to store the \xc2\xb4 string again and got a different result with c3 82 c2 b4: (You don´t know UTF-8?) showing up in the database.
And when I look at the structure for the database table it is UTF8. In fact here's the msqldump for creating it:"The server character set and collation are used as default values if the database character set and collation are not specified in CREATE DATABASE statements. They have no other purpose."
[text]-- phpMyAdmin SQL Dump
-- version 3.2.5
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Nov 08, 2011 at 02:04 PM
-- Server version: 5.1.54
-- PHP Version: 5.3.5-1ubuntu7.3
SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
--
-- Database: `database`
--
-- --------------------------------------------------------
--
-- Table structure for table `table`
--
CREATE TABLE IF NOT EXISTS `table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` longtext,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;[/text]
I did go in and "ALTER DATABASE database CHARACTER SET utf8; Then tried to store the \xc2\xb4 string again and got a different result with c3 82 c2 b4: (You don´t know UTF-8?) showing up in the database.
Re: UTF-8 confusion
Mysql tries to be smart, so it tries to convert data that comes from you from your character set (which it assumes to be latin1 by default). You need to tell it the data you're sending is in utf8 - using either mysql_query("set names utf8") or mysql_set_charset("utf8") or whatever method your database access library provides.
Re: UTF-8 confusion
Thanks. I thought I had this covered, but I looked into the manual and found:
EDIT: That was it. Finally! That was the last piece of the puzzle. Thanks!
So I'll have to try this out.The character set.
Prior to PHP 5.3.6, this element was silently ignored. The same behaviour can be partly replicated with the PDO::MYSQL_ATTR_INIT_COMMAND driver option, as the following example shows.
Even though pdo_mysql accepts an additional "charset" parameter in the DSN (see mysql_driver.c:442), it does not do anything with it.
A possible workaround to set the charset to UTF-8, for example, could be:
<?php
$dbh = new PDO("mysql:$connstr", $user, $password);
$dbh -> exec("set names utf8");
?>
EDIT: That was it. Finally! That was the last piece of the puzzle. Thanks!