Hi guys
This is something I've always wondered what the best practice was.
Back in my HTML learning days (a frightening 12 years ago) I always thought you should encode accented chars and special chars into their entities... either the entity or numeric code with numeric code preferred. IIRC the W3C validator checked that characters were properly entity-ised back in those days.
Recently I ran the validator over a UTF-8 site which had many accented chars non-entityised (just pasted into the HTML as a raw text character) and the validator didn't flag those up. They all displayed correctly, to me anyway. I was under the impression that they shd be converted to entities.
Obviously you still need to entityise HTML special chars (" > <), but should you still entity-ise other characters? Accents, symbols etc?
The reason I ask is I'm building a CMS for our website on our intranet where select users will be able to type HTML code directly into our websites.
I want to know whether to advise them to always entity-encode accents/symbols/anything... or just use entities for HTML special chars?
Cheers, B
HTML entities on UTF-8 site, best practice
Moderator: General Moderators
Re: HTML entities on UTF-8 site, best practice
Characters like 'āēūīķļņšž' (and other for other languages) doesn't need to be converted to entities.
From http://www.w3.org/TR/xhtml1/#a_dtd_Special_characters "Entity Sets":
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
From http://www.w3.org/TR/xhtml1/#a_dtd_Special_characters "Entity Sets":
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
Re: HTML entities on UTF-8 site, best practice
Ah ok
But anything with an official entity equivalent... euro, copyright, accented western euro chars etc... should all be done using entities?
Cheers, B
But anything with an official entity equivalent... euro, copyright, accented western euro chars etc... should all be done using entities?
Cheers, B
Re: HTML entities on UTF-8 site, best practice
Right after reading plenty of articles and being in IRC all day, I've got my plan of attack.
Only HTML special chars... < > & " should be represented as entities
Apostrophes I only absolutely need to do when outputting XML (eg: RSS) or when using apostrophes to enclose attribute values eg:
Everything else should be stored as the plain text UTF-8 character in the DB
At least doing it this way I can make it consistent, so it's easy to convert at a later stage
Hope this helps someone out
Only HTML special chars... < > & " should be represented as entities
Apostrophes I only absolutely need to do when outputting XML (eg: RSS) or when using apostrophes to enclose attribute values eg:
Code: Select all
<a title='dave's computer'> //ERROR
<a title='dave's computer'> //CORRECT
<a title="dave's computer"> //CORRECTAt least doing it this way I can make it consistent, so it's easy to convert at a later stage
Hope this helps someone out
-
DaiLaughing
- Forum Commoner
- Posts: 76
- Joined: Thu Jul 16, 2009 8:03 am
Re: HTML entities on UTF-8 site, best practice
As long as your server is set up properly (Ubuntu isn't for one as the php generated content loses utf-8 encoding unless you manually change php.ini).