Page 1 of 1

Character encoding

Posted: Tue Oct 02, 2007 4:57 am
by shiznatix
Ok I have this strange problem. I have a username saved in my database that has this word in it: Söze

now when I view this record in phpMyAdmin all is well, it shows the ö no problem. But when I pull this out of the database and display it on the website it wont show the ö but instead has the 'unknown character' symbol. But if I copy it from phpMyAdmin and put it as static text on the site it shows the ö no problem.

So what is happening from the database to the site thats screwing it up so much?

Posted: Tue Oct 02, 2007 5:11 am
by s.dot
Are you using html_entities() on the data? If so, are you using the correct character set? It defaults to ISO-8859-1.. so if you didn't use a character set, this could be the problem.

Also what's the collation of your database table?

Posted: Tue Oct 02, 2007 5:20 am
by shiznatix
I am running the data through phptal which does use html_entities or something like it but thats not the problem since if I just give it a string with that letter in it, it shows it just fine. Its coming from the database thats the problem.

My database colation is: latin1_swedish_ci

Posted: Wed Oct 03, 2007 6:53 am
by s.dot
If I'm correct, that collation is for western/european countries. I actually have this same problem with unicode characters on one of my web sites (same collation) showing improperly. I never dug into it to solve it, though. I wonder if setting the collation to a utf-8* if it would solve the problem.

Posted: Wed Oct 03, 2007 6:56 am
by onion2k
Collation has no effect on how characters are stored, only how they're ordered when they're selected from the database. It's the character set of the table you need to worry about, eg charset= in the create table syntax.

Posted: Wed Oct 03, 2007 12:00 pm
by shiznatix
My charset is such:

DEFAULT CHARSET=latin1

I am sure this is on all my tables because its just the default. But what I don't understand is why phpMyAdmin can extract the data and show it no problems at all but when I do it using just complete simple mysql queries on my website it messes up. It's the same server and everything so I don't understand.

Posted: Wed Oct 03, 2007 11:22 pm
by cade
I also got problem when to generate the RSS. I have chinese character stored in database and is about to pull out the content (in chinese) from database and write it in RSS. But when view the xml file, the character showing the '????' instead of the encoding one. I use html_entity_decode for the result from database. What is the right character setting for this?

Posted: Thu Oct 04, 2007 3:15 am
by onion2k
shiznatix wrote:My charset is such:

DEFAULT CHARSET=latin1

I am sure this is on all my tables because its just the default. But what I don't understand is why phpMyAdmin can extract the data and show it no problems at all but when I do it using just complete simple mysql queries on my website it messes up. It's the same server and everything so I don't understand.
The best way to be sure it'll work is to make sure everything matches. This means you need to check:

The character set of the database table.
The character set of the database client (eg PHP .. you can set it with mysql_query("SET NAMES UTF-8"); .. or use CONVERT() in each SQL statement ).
The character set of the HTML page (set with header("Content-type: text/html; charset=utf8;"); .. or a meta tag .. or both. )
The character set of the browser (It defaults to autodetect which sets it to the header/meta character set, but if you've changed it to a manual setting things won't display right).

Plus, the the character set of the incoming form data needs to be correct else you'll end up trying to put wrongly encoded characters into the database (it should be the same as the HTML page unless your form has a language setting).

Posted: Mon Oct 08, 2007 5:02 am
by shiznatix
onion2k wrote:The character set of the database client (eg PHP .. you can set it with mysql_query("SET NAMES UTF-8"); ..
Hooray! That did the trick, but remember kids, the query is ("SET NAMES 'UTF8'")

Thanks onion

Posted: Mon Oct 08, 2007 5:06 am
by shiznatix
Whoops, spoke too soon.

That worked for getting the username of the guy out of the database without problems but I have a lot of serialized arrays stored as text in the database and now pulling those out just screws everything up with letters becoming just crazy symbols and stuff. Any way around this or a way to convert everything stored in the database to utf8 or something?

Posted: Wed Oct 10, 2007 2:44 am
by shiznatix
bump ^^

Posted: Wed Oct 10, 2007 7:12 am
by Weirdan
what field type you're using to store that data? If it's TEXT - switch to BLOB, it shouldn't autoconvert the data as TEXT does.

Posted: Thu Oct 11, 2007 7:52 am
by shiznatix
Weirdan wrote:what field type you're using to store that data? If it's TEXT - switch to BLOB, it shouldn't autoconvert the data as TEXT does.
And that was it. Thanks very much, with that and the SET NAMES everything is working all happy dandy.

By the way, what is the difference between TEXT and BLOB?

Posted: Thu Oct 11, 2007 12:35 pm
by Weirdan
TEXT has encoding and collation while BLOB is just a binary storage.