Page 2 of 2

Posted: Thu Oct 12, 2006 7:29 pm
by batfastad
Anyone have any info on this charset and entities issue?


Thanks

Ben

Posted: Thu Oct 12, 2006 9:24 pm
by neophyte
Which issue? The mysql injection issue?

Seach around http://shiflett.org/ you'll find that one there somewhere....

Posted: Fri Oct 13, 2006 3:34 am
by Mordred
No, chr() is for ASCII. Check out the docs for html_entity_decode()

Posted: Fri Oct 13, 2006 4:36 am
by batfastad
Hi guys

I've got the MySQL injection thing sorted now. The final thing is converting various HTML entities into their characters, and storing those in MySQL.

I'm using...

Code: Select all

html_entities_decode()

and

function numericentitieshtml($str) {
        return utf8_encode(preg_replace('/&#(\d+);/e', 'chr(str_replace(";", "", str_replace("&#","","$0")))', $str));
}
To change HTML alpha entities and HTML numeric entities respectively, into their proper characters.

The problem is the numericentitieshtml() function (from the PHP manual somewhere) is converting the HTML numeric entities, but into different characters.

So running the following entities through my numericentities() function (without the periods obviously)...

Code: Select all

&.#8482; becomes a " quote mark whereas it should be a TM symbol
&.#8364; becomes a ¬ character whereas it should be euro symbol
I'm guessing this is because the TM and euro symbols aren't in the UTF-8 charset
So what charset do I need to use instead to get these chars into my database properly?
The collation columns in my database tables are all currently set to latin1_swedish_ci

Hope this makes things a bit clearer!


Ben

Posted: Fri Oct 13, 2006 4:48 am
by Mordred
Hurray, my crystal ball is working, I answered two minutes before the question! :)

(bogus answer, so batfastad would hopefully receive a notice)