Page 1 of 1

Dealing with non-english characters in a database search

Posted: Thu Nov 29, 2012 4:09 am
by social_experiment
I have the following data in a table : À, Á, Â, Ã, à, á, â, ã . The html below

Code: Select all

<select id="tdd" name="tdd" >
<option value="À" >À</option>
<option value="Á" >Á</option>
<option value="Ã" >Ã</option>
<option value="Â" >Â</option>
</select>
And the php

Code: Select all

<?php
$term = mysqli_real_escape_string($conn, $_POST['tdd']);
//
$sql = "SELECT COUNT(`id`) FROM `test_table` WHERE `character_field` = '" . $term . "' ";
$qry = mysqli_query($conn, $sql);
$ary = mysqli_fetch_array($qry);
$row = $ary[0];
//		
echo $row;
echo '<br />';
echo $sql;
?>
Each time I select an option my results look like this
--
8
SELECT COUNT(`id`) FROM `test_table` WHERE `character_field` = 'Á'
--
8 being the number of records in my table and the query below it with the specific character selected.

How do i deal with these type of characters when searching for them in the database?

The collation for the field in question (character_field) is latin1_swedish_ci. Do i have to change this when dealing with non-english characters? The charset for the page is iso-8859-1.

Thanks in advance

Edit
Setting the collation for the specific field to latin1_general_ci returns 2 rows per search; it seems the problem is that the à and ã is seen as the same character for some reason.

Edit
Looks like i found a solution : latin1_general_cs as collation solves the problem.

Re: Dealing with non-english characters in a database search

Posted: Thu Nov 29, 2012 8:20 am
by Christopher
Yes, it is really annoying that MySQL defaults to latin1_swedish_ci and not latin1_general_cs. You might also want to try UTF8.

Re: Dealing with non-english characters in a database search

Posted: Thu Nov 29, 2012 4:51 pm
by social_experiment
Christopher wrote:You might also want to try UTF8.
Would this work for those type of characters because if i make the charset of the page utf-8 then they are displayed incorrectly (a little black square with a question mark in it usually). Or is it only for storage purposes, the UTF8?

Re: Dealing with non-english characters in a database search

Posted: Thu Nov 29, 2012 10:41 pm
by Eric!
It's been a while since I've done much with sql/php and utf8, but here's some of the things I remember struggling with.

PHP has problems with UTF8 strings because you have to use the multibyte functions.

Also I usually check the settings

Code: Select all

mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_http_output('UTF-8');
ob_start('mb_output_handler');
(In addition to setting up the database as utf8) Use the following for mysql after connecting:
[text]SET NAMES utf8;
SET CHARACTER SET utf8;[/text]

Some other things to watch out for http://www.phpwact.org/php/i18n/utf-8

It seems like there's a couple other tricks, but I don't have my utf8 database code with me to check through right now.

EDIT: Right, I remember now. There was a problem with PDO not setting the character set viewtopic.php?f=2&t=132655&hilit=+utf8