How do YOU handle Character Encoding?

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply

How do you handle character encoding?

Encoding? What encoding?
4
36%
I use Latin-1, international users begone!
0
No votes
UTF-8 with painstakingly hand-crafted subroutines
1
9%
Ah, UTF-8, but it's very casual
5
45%
I use an external library (tell us which!)
0
No votes
Waiting for PHP 6
1
9%
 
Total votes: 11

User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

How do YOU handle Character Encoding?

Post by Ambush Commander »

Just curious. I'm trying to figure out whether or not I should build in character encoding into a parser I built, and it's quite a knotty issue.
User avatar
daedalus__
DevNet Resident
Posts: 1925
Joined: Thu Feb 09, 2006 4:52 pm

Post by daedalus__ »

?
fastfingertips
Forum Contributor
Posts: 242
Joined: Sun Dec 28, 2003 1:40 am
Contact:

Post by fastfingertips »

You can set the encoding in the escaping method, so i suppose that this process will be triggered when you select data from DB. If you have also translations depending on how are you handle them (DB or file) you may decide to move the process from DL to View.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Encoding in the database is one issue, but most people stick it in as Latin-1 regardless of what character set they're using. It's only important when you rely on the database's collation functionality. MySQL 4.0 doesn't have good Unicode support (MySQL 4.1 basically fixes all the problems), so this is what most of these people do.

However, character encoding also applies to the output and processing of data. Here are some issues:

1. Do you use Unicode? There is absolutely no reason you shouldn't be using Unicode. Read this to find out more about Unicode in general and common issues: http://www.phpwact.org/php/i18n/charsets

2. Do you explicitly define the character set by setting header('Content-type: text/html; charset=utf-8');? Do you specify the http-equiv meta header?

3. Do you assume that everything user submitted is in the correct encoding? In terms of forms, this generally isn't a huge problem, because even though virtually no one specifies accept-charset, the browser usually is smart enough to encode it according to the encoding of the form itself.

However, start offering other places where user uploads can get in like file uploads, and you can't assume anything about it. You have to figure out what the encoding is, get rid of the byte order mark (if there is one), and convert it to UTF-8 (if it isn't that already).

4. Do you account for low quality browsers mangling textareas with Unicode characters? MediaWiki fixes this by transparently converting all Unicode characters to entities when the trouble browsers show up. Do you?
Post Reply