I'm writing an Open-Source app that will hopefully be used by people on a wide variety of hosting configurations. Hence, a database abstraction library seemed to be necessary. I chose AdoDB after some research, but it wouldn't be disastrous at this stage to change to another library, as my code isn't particularly coupled to it.
My app deals with messaging - it's a kind of hub for receiving and distributing messages between any number of formats - bulletin boards, email, RSS, NNTP, PDF, etc. I have been thinking for the last few days that perhaps I should address the issue of character encodings as soon as possible, so that I don't have to do it retrospectively when people start asking me why they get funny ??? marks in their emails.
UTF-8 seems the way to go unless I'm mistaken. I've had a look at this nice article, and have also found an interesting library from Harry Fuecks which may help.
There are a few points which worry me:
- From the first article:
Is there any way I can do the same thing but cater for a larger audience (i.e. those who don't have the mbstring extension installed)?PHP's internal encoding should be the same the one in which the PHP files are saved in. To set the encoding, call mb_internal_encoding at the very beginning of your script:
mb_internal_encoding('UTF-8'); - I don't see much, if anything, on setting the various stages of encoding and collation with AdoDB. If it is possible to combine AdoDB and UTF-8 successfully, please tell me what I need to do. Otherwise, what's my next option? One of the reasons I chose AdoDB was because of the XML schema library for creating and updating tables. To a relative SQL novice like me, it seemed attractive for my aims of greater portability and preventing bugs. So, my question I guess is if there is a good DB abstraction library which can support full UFT-8 (connections, queries, result sets, etc), and has the XML schema creation that a lot seem to boast? I looked at PEAR::MDB2 too (which also has the XML facility), but as far as I can tell the charset functions haven't been fully implemented yet. The library must be PHP 4 and 5 compatible.
- In order to support a multitude of languages without encoding problems, am I right to go with UFT-8 now? Or is there a simpler solution? Or should I forget about it for now, get the app written (a few months away at least), and address encoding later, perhaps when MDB2 is fully Charset-aware?
Thanks for any advice you can give!