UTF8 Problems

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
$var
Forum Contributor
Posts: 317
Joined: Thu Aug 18, 2005 8:30 pm
Location: Toronto

UTF8 Problems

Post by $var »

Hi!

I'm taking a blog feed, parsing it into an array, and spitting it out through my site.
The feed itself doesn't have any funky characters, but when I pass it through the site (which IS ISO-5899-1 encoded) I get this:
Last night, as the performance was about to begin, the emcee’s instructions were clear.“Hold up your cellphones, Blackberrys, iPhones or what have you in front of you;
When I run it through utf8_decode(); it does change the whacky characters ... but to question marks! ?

This is a common problem with the sites I work on, the blog platform has different encoding than the pages.
Anything you can suggest about this?
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: UTF8 Problems

Post by Apollo »

$var wrote:The feed itself doesn't have any funky characters,
Actually, it does. The correct representation of your text is:

... the emcee’s instructions were clear.“Hold up ...

And this contains two funky chars: the ’ (unicode 8217, instead of the regular ' single quote) and “ (unicode 8220, instead of the regular " double quote).
These funky characters most likely come from some noob copy/pasting stuff from MS Word, which has the tendency to replace regular quotes with funky ones.
but when I pass it through the site (which IS ISO-5899-1 encoded) I get this:
Well there's your problem: those quote characters can't be represented in iso-8859-1. It's an ansi encoding, and only contains limited characters. Just like Chinese or Klingon characters can't be expressed in iso-8859-1, neither can the exotic characters above.
Anything you can suggest about this?
Use utf-8 everywhere: in your html headers, in your content, and in your database collations.

If you still prefer an ansi encoding, then why the heck would you pick the extremely limited iso-8859-1 ? (as oposed to windows-1252 for example, which contains pretty much all iso-8859-1 characters plus some funky ones such as the strange quote thingies).

Alternatively (or on top of that), replace freaky quote chars with regular ones in any content.
Post Reply