HTML entities on UTF-8 site, best practice
Posted: Fri Jun 26, 2009 7:12 am
Hi guys
This is something I've always wondered what the best practice was.
Back in my HTML learning days (a frightening 12 years ago) I always thought you should encode accented chars and special chars into their entities... either the entity or numeric code with numeric code preferred. IIRC the W3C validator checked that characters were properly entity-ised back in those days.
Recently I ran the validator over a UTF-8 site which had many accented chars non-entityised (just pasted into the HTML as a raw text character) and the validator didn't flag those up. They all displayed correctly, to me anyway. I was under the impression that they shd be converted to entities.
Obviously you still need to entityise HTML special chars (" > <), but should you still entity-ise other characters? Accents, symbols etc?
The reason I ask is I'm building a CMS for our website on our intranet where select users will be able to type HTML code directly into our websites.
I want to know whether to advise them to always entity-encode accents/symbols/anything... or just use entities for HTML special chars?
Cheers, B
This is something I've always wondered what the best practice was.
Back in my HTML learning days (a frightening 12 years ago) I always thought you should encode accented chars and special chars into their entities... either the entity or numeric code with numeric code preferred. IIRC the W3C validator checked that characters were properly entity-ised back in those days.
Recently I ran the validator over a UTF-8 site which had many accented chars non-entityised (just pasted into the HTML as a raw text character) and the validator didn't flag those up. They all displayed correctly, to me anyway. I was under the impression that they shd be converted to entities.
Obviously you still need to entityise HTML special chars (" > <), but should you still entity-ise other characters? Accents, symbols etc?
The reason I ask is I'm building a CMS for our website on our intranet where select users will be able to type HTML code directly into our websites.
I want to know whether to advise them to always entity-encode accents/symbols/anything... or just use entities for HTML special chars?
Cheers, B