Character encoding

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Character encoding

Post by shiznatix »

Ok I have this strange problem. I have a username saved in my database that has this word in it: Söze

now when I view this record in phpMyAdmin all is well, it shows the ö no problem. But when I pull this out of the database and display it on the website it wont show the ö but instead has the 'unknown character' symbol. But if I copy it from phpMyAdmin and put it as static text on the site it shows the ö no problem.

So what is happening from the database to the site thats screwing it up so much?
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Are you using html_entities() on the data? If so, are you using the correct character set? It defaults to ISO-8859-1.. so if you didn't use a character set, this could be the problem.

Also what's the collation of your database table?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

I am running the data through phptal which does use html_entities or something like it but thats not the problem since if I just give it a string with that letter in it, it shows it just fine. Its coming from the database thats the problem.

My database colation is: latin1_swedish_ci
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

If I'm correct, that collation is for western/european countries. I actually have this same problem with unicode characters on one of my web sites (same collation) showing improperly. I never dug into it to solve it, though. I wonder if setting the collation to a utf-8* if it would solve the problem.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

Collation has no effect on how characters are stored, only how they're ordered when they're selected from the database. It's the character set of the table you need to worry about, eg charset= in the create table syntax.
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

My charset is such:

DEFAULT CHARSET=latin1

I am sure this is on all my tables because its just the default. But what I don't understand is why phpMyAdmin can extract the data and show it no problems at all but when I do it using just complete simple mysql queries on my website it messes up. It's the same server and everything so I don't understand.
cade
Forum Commoner
Posts: 55
Joined: Tue Jul 03, 2007 8:18 pm

Post by cade »

I also got problem when to generate the RSS. I have chinese character stored in database and is about to pull out the content (in chinese) from database and write it in RSS. But when view the xml file, the character showing the '????' instead of the encoding one. I use html_entity_decode for the result from database. What is the right character setting for this?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

shiznatix wrote:My charset is such:

DEFAULT CHARSET=latin1

I am sure this is on all my tables because its just the default. But what I don't understand is why phpMyAdmin can extract the data and show it no problems at all but when I do it using just complete simple mysql queries on my website it messes up. It's the same server and everything so I don't understand.
The best way to be sure it'll work is to make sure everything matches. This means you need to check:

The character set of the database table.
The character set of the database client (eg PHP .. you can set it with mysql_query("SET NAMES UTF-8"); .. or use CONVERT() in each SQL statement ).
The character set of the HTML page (set with header("Content-type: text/html; charset=utf8;"); .. or a meta tag .. or both. )
The character set of the browser (It defaults to autodetect which sets it to the header/meta character set, but if you've changed it to a manual setting things won't display right).

Plus, the the character set of the incoming form data needs to be correct else you'll end up trying to put wrongly encoded characters into the database (it should be the same as the HTML page unless your form has a language setting).
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

onion2k wrote:The character set of the database client (eg PHP .. you can set it with mysql_query("SET NAMES UTF-8"); ..
Hooray! That did the trick, but remember kids, the query is ("SET NAMES 'UTF8'")

Thanks onion
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

Whoops, spoke too soon.

That worked for getting the username of the guy out of the database without problems but I have a lot of serialized arrays stored as text in the database and now pulling those out just screws everything up with letters becoming just crazy symbols and stuff. Any way around this or a way to convert everything stored in the database to utf8 or something?
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

bump ^^
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

what field type you're using to store that data? If it's TEXT - switch to BLOB, it shouldn't autoconvert the data as TEXT does.
User avatar
shiznatix
DevNet Master
Posts: 2745
Joined: Tue Dec 28, 2004 5:57 pm
Location: Tallinn, Estonia
Contact:

Post by shiznatix »

Weirdan wrote:what field type you're using to store that data? If it's TEXT - switch to BLOB, it shouldn't autoconvert the data as TEXT does.
And that was it. Thanks very much, with that and the SET NAMES everything is working all happy dandy.

By the way, what is the difference between TEXT and BLOB?
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

TEXT has encoding and collation while BLOB is just a binary storage.
Post Reply