Page 1 of 1

htmlspecialchars to and from the database

Posted: Mon Aug 24, 2009 7:38 am
by superdezign
On my blog, I have my posts set up with a raw content field and a parsed content field in the database. The raw content field is for editing, the parsed content field is already parsed as HTML. I'm having a problem, however, with HTML character codes that contain the pound sign. Standard character such as ñ (ñ) and þ (þ) work fine. When they are submitted, they go to the database in their raw form, and are translated into their HTML character code equivalent in their parsed form. However, for complex characters such as ★ (★) and ☆ (☆) are saved in their HTML character code format, which causes htmlspecialchars() to parse the ampersand as a special character. For example, "☆" becomes "☆", which displays as "☆" instead of "☆".

Are there any suggestions for fixing this behavior? What is causing these character codes to be submitted into the database incorrectly?


EDIT: For now, I am combating this by using str_replace('&#', '&#', htmlspecialchars($content)). I'd prefer a less hackish solution, though.

Re: htmlspecialchars to and from the database

Posted: Mon Aug 24, 2009 12:16 pm
by akuji36
Take a look at the following video tutorial regarding regular expressions:

http://www.phpvideotutorials.com/regex/

It should help you out. address issues like strip slashes and
magic quotes.

Rod

Re: htmlspecialchars to and from the database

Posted: Mon Aug 24, 2009 2:26 pm
by Darhazer
The definition of the function:

Code: Select all

string htmlspecialchars ( string $string [, int $quote_style= ENT_COMPAT [, string $charset [, bool $double_encode= true ]]] )
Set the last parameter to false and the entities won't be encoded twise.

Re: htmlspecialchars to and from the database

Posted: Sun Aug 30, 2009 10:04 am
by superdezign
Ahh, I didn't know that double_encode had to do with that. Thank you. :D

Re: htmlspecialchars to and from the database

Posted: Sun Aug 30, 2009 5:13 pm
by cpetercarter
htmlspecialchars() will transform only the most common 'problem' characters. To encode all non-standard characters, you need to use htmlentities().

The 'double encoding' parameter only came in in php 5.2.3. If you have an earlier php version, you can prevent double encoding by first decoding the string, and then encoding it again.