Dealing with non-english characters in a database search

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Dealing with non-english characters in a database search

Post by social_experiment »

I have the following data in a table : À, Á, Â, Ã, à, á, â, ã . The html below

Code: Select all

<select id="tdd" name="tdd" >
<option value="À" >À</option>
<option value="Á" >Á</option>
<option value="Ã" >Ã</option>
<option value="Â" >Â</option>
</select>
And the php

Code: Select all

<?php
$term = mysqli_real_escape_string($conn, $_POST['tdd']);
//
$sql = "SELECT COUNT(`id`) FROM `test_table` WHERE `character_field` = '" . $term . "' ";
$qry = mysqli_query($conn, $sql);
$ary = mysqli_fetch_array($qry);
$row = $ary[0];
//		
echo $row;
echo '<br />';
echo $sql;
?>
Each time I select an option my results look like this
--
8
SELECT COUNT(`id`) FROM `test_table` WHERE `character_field` = 'Á'
--
8 being the number of records in my table and the query below it with the specific character selected.

How do i deal with these type of characters when searching for them in the database?

The collation for the field in question (character_field) is latin1_swedish_ci. Do i have to change this when dealing with non-english characters? The charset for the page is iso-8859-1.

Thanks in advance

Edit
Setting the collation for the specific field to latin1_general_ci returns 2 rows per search; it seems the problem is that the à and ã is seen as the same character for some reason.

Edit
Looks like i found a solution : latin1_general_cs as collation solves the problem.
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Dealing with non-english characters in a database search

Post by Christopher »

Yes, it is really annoying that MySQL defaults to latin1_swedish_ci and not latin1_general_cs. You might also want to try UTF8.
(#10850)
User avatar
social_experiment
DevNet Master
Posts: 2793
Joined: Sun Feb 15, 2009 11:08 am
Location: .za

Re: Dealing with non-english characters in a database search

Post by social_experiment »

Christopher wrote:You might also want to try UTF8.
Would this work for those type of characters because if i make the charset of the page utf-8 then they are displayed incorrectly (a little black square with a question mark in it usually). Or is it only for storage purposes, the UTF8?
“Don’t worry if it doesn’t work right. If everything did, you’d be out of a job.” - Mosher’s Law of Software Engineering
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Re: Dealing with non-english characters in a database search

Post by Eric! »

It's been a while since I've done much with sql/php and utf8, but here's some of the things I remember struggling with.

PHP has problems with UTF8 strings because you have to use the multibyte functions.

Also I usually check the settings

Code: Select all

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_http_output('UTF-8');
ob_start('mb_output_handler');
(In addition to setting up the database as utf8) Use the following for mysql after connecting:
[text]SET NAMES utf8;
SET CHARACTER SET utf8;[/text]

Some other things to watch out for http://www.phpwact.org/php/i18n/utf-8

It seems like there's a couple other tricks, but I don't have my utf8 database code with me to check through right now.

EDIT: Right, I remember now. There was a problem with PDO not setting the character set viewtopic.php?f=2&t=132655&hilit=+utf8
Post Reply