Page 1 of 1
Minimum Every Software Developer Should Know About Unicode
Posted: Fri Dec 01, 2006 6:15 pm
by Luke
I looked around to make sure this article wasn't already posted on this site, and found nothing, so I apologize if it already is here, but I found it very helpful as I was pretty clueless about unicode and character sets, and it really enlightened me.
http://www.joelonsoftware.com/articles/Unicode.html
I hope it helps you guys.

Posted: Fri Dec 01, 2006 11:16 pm
by RobertGonzalez
Great tutorial/advice/teaching Ninja.
Posted: Sat Dec 02, 2006 12:40 am
by m3mn0n
Great find! Lots and lots of detail and history about the whole world of character sets.
I am interested in doing some internationalization/localization for a site of mine and this saves me buying a book about Unicode.

Posted: Sat Dec 02, 2006 3:38 am
by Chris Corbyn
That's brilliant!

I sense this could go in a useful resources sticky somewhere but I'm not sure where

Posted: Sat Dec 02, 2006 7:46 am
by RobertGonzalez
Me either. I was going to move it, but I couldn't find an acceptable location. I was thinking Usability, but that seems to shrink the scope of what the tutorial teaches. Anyway, it is still a great little piece of information. Thanks again Ninja goat man.
Posted: Sat Dec 02, 2006 8:56 am
by Chris Corbyn
LOL

So why are we so confused where it goes? Isn't that what Miscaellaneous is for?
(and why am I laughing? it's not even funny. Beer)
Posted: Sat Dec 02, 2006 9:21 am
by RobertGonzalez
I feel you. I was up at 5:00 AM yesterday for work, came home and was up till 11:00 with the wife. Fell asleep on the couch, woke up at 1:45AM and have been coding ever since. It is about 7:20AM and I am frickin loopy.
Posted: Sat Dec 02, 2006 4:38 pm
by Maugrim_The_Reaper
I probably see it as hugely relevant because I literally have a "funny" character in my name, Pádraic. It's incredible to see modern day applications from email systems to online open source websites (cough...Sourceforge...cough) serve HTML which clearly states it's serving UTF-8 mysteriously replacing á with a squiggly A character and an horizontal bar. This can be passed off as a PHP thing sometimes if someone if fiddling with the name string using a PHP string function without mbstring or other used.
I saw the funny side back a few months ago, when a blog post I wrote was syndicated through Devzone, PHP-Planet and PHPDeveloper. I think Devzone still requires using the á entity to cope...
But there's worse, people serving static HTML with a UTF-8 charset which isn't even encoded in UTF-8 at all... This is the worst of the lot, since that's when you're most likely to get ???? marks in place of non-ASCII/English characters in a browser using UTF-8. Even more confusing, the web designer/developer may not even notice this immediately because saving a file containing only ASCII and áíúóé characters as UTF-8 becomes futile unless you originally created the file as UTF-8 and added a funny character BEFORE saving it. Any other way forces you to open file, save as UTF-8 (again) and then and ONLY then type UTF-8 characters outside standard ASCII.
Personally I think half the editors available online are unreliable without a bit of coaxing.