I understand that with using UTF-8 you have the best chance of being able to work well with a wide range of (international) characters. However, a lot of information is (still) encoded as latin-1 (ISO-8859-1). Which leads to the familiar problems of getting weird characters on your web pages.
My gut feeling is that I should try to stick with UTF-8 everywhere. But, as I said, some data in my database might still be latin-1. Or new data coming in might be latin-1.
For example, in one of my projects I got handed a spreadsheet with data which I imported in my database. The data was latin-1. Should I be pragmatic and just set a
Code: Select all
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">(and then use <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />)
I also discovered that the zend framework doesn't send a specific header setting the character set to UTF-8. Is that on purpose? Or is that only done by the server?
[edit:]
Another issue apparantly is browsers
http://dev.mysql.com/tech-resources/art ... icode.html
Does this mean that if people use Internet Explorer, they are going to send Windows-1252 anyway, regardless of what I try to set as character encoding? If that's the case I might as well forget using UTF-8If your HTML page contains a form, browsers will generally send the results back in the character set of the page. So if your page is sent in UTF-8, you will (usually) get UTF-8 results back. The default encoding of HTML documents is ISO-8859-1, so by default you will get form data encoded as ISO-8859-1, with one big exception: some browsers (including Microsoft Internet Explorer and Apple Safari) will actually send the data encoded as Windows-1252, which extends ISO-8859-1 with some special symbols, like the euro (€) and the curly quotes (“”).