Working on a tutorial for UTF-8

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Slightly OT but... does anyone know where I can find a list of encodings that are single byte only, as in, ones that all PHP's native string processing functions will have no difficulty with. I've googled and found lists of encodings but I have to read about them all to find out if they are single byte or not a categorized list would be great.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

All of the ISO and Windows ones are 8-bit. Chinese and Japanese ones definitely are not 8-bit.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Are you sure about Windows? One of them is Hebrew.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Hebrew is 8-bit. The ISO ones are by definition, 8-bit, the Windows ones are usually 8-bit, excluding the obvious Chinese, Japanese, Korean and Vietnamese ones. Note that Windows encodings will often use control characters to represent glyphs, which is forbidden in many web contexts.
User avatar
Ollie Saunders
DevNet Master
Posts: 3179
Joined: Tue May 24, 2005 6:01 pm
Location: UK

Post by Ollie Saunders »

Right OK, thanks AC.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I've updated it with information on forms. I've also fleshed out the rest of the document to get a handle on what I still have to write.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Bump!

I've added a LOT more information near the very end, so any comments on the new stuff?
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

non-php section wrote:You may, for whatever reason, may need to set the character encoding on non-PHP files, usually plain ol' HTML files.
take out the second "may"
xml section wrote:In reality, this happens only when the XHTML is actually served as legit XML and not HTML, which is almost always never due to Internet Explorer's lack of support for application/xhtml+xml (even though doing so is often argued to be good practice).
"which is almost always never due"
later in xml section wrote:In short, if you use XHTML and have gone through the trouble of adding the XML header, be sure to make sure it jives with your META tags and HTTP headers.
"be sure to make sure it jives" - probably "make sure to check that it jives" would be better

that's as far as I got (the xml part) I'll read the rest later.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Fixed. Thanks a lot for the proofreading, it is greatly appreciated.
User avatar
Luke
The Ninja Space Mod
Posts: 6424
Joined: Fri Aug 05, 2005 1:53 pm
Location: Paradise, CA

Post by Luke »

No problem, just got around to reading the rest and I could only find one other thing to nit-pick about.

I would not abbreviate internationalization with I18N, but if you must, I would provide a link to wikipedia explaining it or wrap an acronym tag around it to explain it. http://en.wikipedia.org/wiki/Internatio ... calization

That is an outstanding tutorial. Nice work. I have to ask though, how did you become such an expert? Generally knowing this much about a subject comes from necessity. Are you multilingual?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

I would not abbreviate internationalization with I18N
Fixed.
I have to ask though, how did you become such an expert? Generally knowing this much about a subject comes from necessity. Are you multilingual?
Not really (passing knowledge of French and Chinese, but I can write neither).

I'd attribute it to research skills and perfectionism. I thirst to understand a problem, not have an answer that "just works".

But most of the problems I'd already encountered before, because I've managed a Taiji Club website that supports Chinese and English (my parents are multilingual), and I had to tackle the troubles when I was implementing HTML Purifier. A lot of the advice about database migration, unfortunately, comes straight from the documentation, and though I presume it works, I haven't actually tried to apply it. This is bad, and I expect that sooner or later someone will shoot me an email stating that they couldn't get my advice to work (hopefully they didn't nuke their database in the process). ;-)

I don't think I'm missing any more major points, but I could be wrong. Does everything seem to be covered?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

DONE! Two new sections and a shiny table of contents.
Post Reply