MySQL Collation - what should I have

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

MySQL Collation - what should I have

Post by batfastad »

Hi everyone

I'm not too clear on charsets and everything, and we're gradually developing a small intranet company/contact manager app in PHP and MySQL.
I'm gradually porting data over from a FileMaker database.

Having set up the MySQL database when viewing the DB structure in phpMyAdmin, the collation of the tables appears to be latin1_swedish_ci

Is this the collation I want?
In FileMaker we currently store all sorts of strange characters... swedish accented chars as well as many eastern European accented characters.

Is latin1_swedish_ci going to be good enough for what we need to store?

Thanks
Ben
User avatar
arjan.top
Forum Contributor
Posts: 305
Joined: Sun Oct 14, 2007 4:36 am
Location: Hoče, Slovenia

Post by arjan.top »

utf-8, all that you'll ever need :D
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

I have switched all collation on our servers to utf-8. It was Latin_general_ci for a while, but there are a few bugs in the Stored Proc engine when executing queries with Latin collation that sometimes munge the general into swedish and cause clash.

I have not had any problems with UTF-8.
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

So UTF-8 it is :D
And I also need to remember to change the connection collation as well.

What impacts will this have on my PHP scripts?
Is there anything I'll need to do in PHP to adjust to this?

I googled and found this which seems like some great information...
http://www.nicknettleton.com/zine/php/p ... cheatsheet

Will I have to do this if I just want to change DB collation?

Is it best to change the collation now, before the database is being used massively?
Or can I just switch it later when all parts are working?

Thanks
Ben
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Yes, you will need to use the mbstring extension for php. You may already have it set up that way (check phpinfo()).
I don't like the idea for the wrapper function for htmlentities(). It's not that hard to just type in 'UTF-8' as the third parameter, rather than create a foriegn function that another programmer might not know what it is.
Setting the meta tags to use UTF-8 charset is a given.

This isn't just for UTF-8, but are things that should be done for any character set.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Excellent!

Just checked the phpinfo() on our intranet server, and it is enabled with the following settings:

Code: Select all

mbstring
Multibyte Support 	enabled
Multibyte string engine 	libmbfl
Multibyte (japanese) regex support 	enabled
Multibyte regex (oniguruma) version 	3.7.1

mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1.

Directive	Local Value	Master Value
mbstring.detect_order	no value	no value
mbstring.encoding_translation	Off	Off
mbstring.func_overload	0	0
mbstring.http_input	pass	pass
mbstring.http_output	pass	pass
mbstring.internal_encoding	ISO-8859-1	no value
mbstring.language	neutral	neutral
mbstring.strict_detection	Off	Off
mbstring.substitute_character	no value	no value
So if I follow any of the steps needed in that tutorial, and change the collation of my database then I'm good to go!
And shouldn't have any problems with any different characters that get thrown my way - that's great :D
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

Also, can I just change the collation using phpMyAdmin and it will convert any characters that it needs to?

Or do I need to check/verify/convert any data that's in the tables already myself?

Thanks
Ben
User avatar
batfastad
Forum Contributor
Posts: 433
Joined: Tue Mar 30, 2004 4:24 am
Location: London, UK

Post by batfastad »

So just to confirm...

Here's the MySQL settings and variables screen from phpMyAdmin:
Image

My final questions on this...

1) Do I need to change all of those settings in the mysql.ini to utf8... from latin1?
Or is just converting the collation of the databases enough?

2) And utf8_unicode_ci is the one I want to set in phpMyAdmin?

2) Will MySQL munge my existing data in the databases when I do this conversion?

3) Do I have to start using the mb_* functions in PHP straight away to get proper operation?
And change the charsets of headers and HTML files?
Or can I leave it until the solution is fully operational?


Thanks
Ben
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

My only advice to you at this point is try some stuff. You will find out a lot of stuff if you try.

And I would change the settings globally for your collation. Otherwise new databases may be set up using latin_swedish_ci instead of utf_8
Post Reply