How to store character entities in a database -best practice

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Anyone have any info on this charset and entities issue?


Thanks

Ben
User avatar
neophyte
DevNet Resident
Posts: 1537
Joined: Tue Jan 20, 2004 4:58 pm
Location: Minnesota

Post by neophyte »

Which issue? The mysql injection issue?

Seach around http://shiflett.org/ you'll find that one there somewhere....
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

No, chr() is for ASCII. Check out the docs for html_entity_decode()
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Hi guys

I've got the MySQL injection thing sorted now. The final thing is converting various HTML entities into their characters, and storing those in MySQL.

I'm using...

Code: Select all

html_entities_decode()

and

function numericentitieshtml($str) {
        return utf8_encode(preg_replace('/&#(\d+);/e', 'chr(str_replace(";", "", str_replace("&#","","$0")))', $str));
}
To change HTML alpha entities and HTML numeric entities respectively, into their proper characters.

The problem is the numericentitieshtml() function (from the PHP manual somewhere) is converting the HTML numeric entities, but into different characters.

So running the following entities through my numericentities() function (without the periods obviously)...

Code: Select all

&.#8482; becomes a " quote mark whereas it should be a TM symbol
&.#8364; becomes a ¬ character whereas it should be euro symbol
I'm guessing this is because the TM and euro symbols aren't in the UTF-8 charset
So what charset do I need to use instead to get these chars into my database properly?
The collation columns in my database tables are all currently set to latin1_swedish_ci

Hope this makes things a bit clearer!


Ben
Last edited by batfastad on Fri Oct 13, 2006 5:14 am, edited 1 time in total.
User avatar
Mordred
DevNet Resident
Posts: 1579
Joined: Sun Sep 03, 2006 5:19 am
Location: Sofia, Bulgaria

Post by Mordred »

Hurray, my crystal ball is working, I answered two minutes before the question! :)

(bogus answer, so batfastad would hopefully receive a notice)
Post Reply