Special characters Search Problem

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

Goodie :)
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

Hi Appolo,

I am facing some other problem with this special charters.I have integrated the code of search in our original site search module. Results are not displaying.

I have tested with test database and tables its working fine.But if i use original database its not showing the results. The data is saved in database as with utf8 encode.

Is there any solution for this to get the search results without any modifications in existing data.


Please help me

-Harpaddy
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

Just using utf8 encoding is not necessarily enough, you also need utf8_unicode_ci collation (or utf8_general_ci if you don't care about ligatures, such as german ß being equal to 'ss').
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

hi apollo,

Thanks for giving reply.

We are using character set as utf8 and collation is utf8_general_ci for database and tables.

-HarPaddy
Last edited by HarPaddy on Fri Apr 03, 2009 7:47 am, edited 1 time in total.
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

Trying going through the same debug procedure again.

Can you enforce the connection to utf8_unicode_ci ? And then do you get results when you try selecting on WHERE 'À'='a' ? Does converting the search term to utf8 (if necessary, with the iconv line I gave earlier) make a difference? If no go, just like last time try converting the search term to ansi instead (windows-1252 would be a safe choice) and keep trying the 'À'='a' clause check until you get a non-empty result.

FYI, when putting the strings in a php test script: chr(0xC0) is 'À' in windows-1252 encoding (as well as iso-8859-1, the two most common ansi standards) and chr(0xC3).chr(0x80) is 'À' in utf-8 encoding.
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

Hi Apollo,

I have tried the same converting search word to ansi and utf8 with same iconv procedure , but not getting any result set.

And when i keep WHERE 'À'='a' all the records present in database are showing all the records presented.

So, how could i reach to get the results of search word like 'àngels'?


Thanks,
HarPaddy
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

1. In your current php script / connection setup etc, which of these two result in ALL entries, and which results in NONE?

Code: Select all

$s1 = chr(0xC0);
$s2 = chr(0xC3).chr(0x80);
$query1 = "SELECT * FROM table WHERE '$s1'='a'";
$query2 = "SELECT * FROM table WHERE '$s2'='a'";
2. In the same script (using the same connection and everything), what do these two search queries result in?

Code: Select all

$s1 = $_POST['searchword'];
$s2 = $s1;
if ($s1 == iconv('utf-8', 'utf-8', $s1)) $s1 = iconv('utf-8', 'windows-1252', $s1);
if ($s2 != iconv('utf-8', 'utf-8', $s2)) $s2 = iconv('windows-1252', 'utf-8', $s2);
$query1 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s1)."%'";
$query2 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s2)."%'";
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

Thanks For reply.

I have executed the queries, these are the outputs we got.

$s1 = chr(0xC0);
$query1 = "SELECT * FROM table WHERE '$s1'='a'";
I got all the records existed in the table


$s2 = chr(0xC3).chr(0x80);
$query2 = "SELECT * FROM table WHERE '$s2'='a'";
Empty result set for this query

And also i have executed with the post search word
$s1 = $_POST['searchword'];
$s2 = $s1;
if ($s1 == iconv('utf-8', 'utf-8', $s1)) $s1 = iconv('utf-8', 'windows-1252', $s1);
if ($s2 != iconv('utf-8', 'utf-8', $s2)) $s2 = iconv('windows-1252', 'utf-8', $s2);
$query1 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s1)."%'";
$query2 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s2)."%'";

For these two queries we got empty result set with the post search word as 'àngels'

But for the search word "Àngels":

if ($s1 == iconv('utf-8', 'utf-8', $s1)) $s1 = iconv('utf-8', 'windows-1252', $s1);
$query1 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s1)."%'";

We got empty result set in the above conversion of search word

if ($s2 != iconv('utf-8', 'utf-8', $s2)) $s2 = iconv('windows-1252', 'utf-8', $s2);
$query2 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s2)."%'";

We got one record for this conversion and the result is : 'Els Àngels restaurant'

These are the outputs i got for the queries.Please check once.


Thanks,
HarPaddy
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

Are you really sure you executed these two queries within the same PHP script, using the very same connection?
HarPaddy wrote:$s1 = chr(0xC0);
$query1 = "SELECT * FROM table WHERE '$s1'='a'";
I got all the records existed in the table

(...)

if ($s2 != iconv('utf-8', 'utf-8', $s2)) $s2 = iconv('windows-1252', 'utf-8', $s2);
$query2 = "SELECT * FROM table WHERE name LIKE '%".mysql_real_escape_string($s2)."%'";

We got one record for this conversion and the result is : 'Els Àngels restaurant'
I'm asking because the first uses ansi encoding, and the other uses utf-8, so either one of them should absolutely fail. Also, the first tests points out that the search is case insensitive ('À'='a') but when using the searchword, it suddenly makes a difference wether you use 'Àngels' or 'àngels', which seems contradictory. If you're sure they're both made from the same connection, please check your server/PHP settings or SQL server config again, because then there's really something messed up with your system.

Anyway, can you try the same tests I posted above again two times, but now once with this before the first query:

Code: Select all

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
and once with

Code: Select all

mysql_query("SET NAMES 'latin1' COLLATE 'latin1_general_ci'");
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

Hi appollo,

I have executed these two scripts

Case1:
mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
$se1 = chr(0xC0);
$se2 = chr(0xC3).chr(0x80);
$q1 = "SELECT * FROM restarant_details WHERE '$se1'='a'";
$r1 = mysql_query($q1);
echo "static ansi ".mysql_num_rows($r1);
$q2 = "SELECT * FROM restarant_details WHERE '$se2'='a'";
$r2 = mysql_query($q2);
echo " ".mysql_num_rows($r2);exit;

I got the output as " static ansi 0 5 "


Case 2:
mysql_query("SET NAMES 'latin1' COLLATE 'latin1_general_ci'");
$se1 = chr(0xC0);
$se2 = chr(0xC3).chr(0x80);
$q1 = "SELECT * FROM restarant_details WHERE '$se1'='a'";
$r1 = mysql_query($q1);
echo "static ansi ".mysql_num_rows($r1);
$q2 = "SELECT * FROM restarant_details WHERE '$se2'='a'";
$r2 = mysql_query($q2);
echo " ".mysql_num_rows($r2);exit;

I got the output as " static ansi 0 0 ".



Thanks,
HarPaddy
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

Ok, and what do you get here? (of course change 'name' to the appropriate column name if necessary)

Code: Select all

mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
$s1 = chr(0xC3).chr(0x80).'ngels'; // 'Àngels' in utf-8 encoding
$s2 = chr(0xC3).chr(0xA0).'ngels'; // 'àngels' in utf-8 encoding
 
echo 'without escaping:';
$q1 = "SELECT * FROM restarant_details WHERE name LIKE '%$s1%'";
$r1 = mysql_query($q1);
echo ' upper '.mysql_num_rows($r1);
$q2 = "SELECT * FROM restarant_details WHERE name LIKE '%$s2%'";
$r2 = mysql_query($q2);
echo ' lower '.mysql_num_rows($r2);
 
echo ' with escaping:'
$s1 = mysql_real_escape_string($s1);
$s2 = mysql_real_escape_string($s2);
$q1 = "SELECT * FROM restarant_details WHERE name LIKE '%$s1%'";
$r1 = mysql_query($q1);
echo ' upper '.mysql_num_rows($r1);
$q2 = "SELECT * FROM restarant_details WHERE name LIKE '%$s2%'";
$r2 = mysql_query($q2);
echo ' lower '.mysql_num_rows($r2);
exit;
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

Hi appollo,

Thanks for reply

I have executed the script, which u send in last message.
And i got the output as like this.

output:

without escaping:
upper 0
lower 0
with escaping:
upper 0
lower 0


Thanks,
HarPaddy
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

I really think you have a buggy system somehow. Is it possible to update your MySQL server or something? What kind of server are you running, Wamp / Lamp?

Anyway, final ultra verbose test:

Code: Select all

$search = array(
 chr(0xC3).chr(0x80).'ngel', // 'Àngel' in utf-8 encoding
 chr(0xC3).chr(0xA0).'ngel', // 'àngel' in utf-8 encoding
 chr(0xC0).'ngel', // 'Àngel' in ansi encoding
 chr(0xE0).'ngel', // 'àngel' in ansi encoding
 'ANGEL', 'angel', 'NGEL', 'ngel' );
 
for ($p=0; $p<10; $p++)
{
    $esc = ($p&1);
    $enc = intval($p/2);    
    echo '<br>with'.($esc?'':'out').' escaping: ';
    echo $enc.'<br>';
    foreach($search as $s)
    {
        if ($enc==1 || $enc==2) mysql_query("SET NAMES 'utf8' COLLATE 'utf8_general_ci'");
        if ($enc==3 || $enc==4) mysql_query("SET NAMES 'latin1' COLLATE 'latin1_general_ci'");
        if (($enc==1 || $enc==3) && $s!=iconv('utf-8','utf-8',$s)) $s = iconv('iso-8859-1','utf-8',$s);
        if (($enc==2 || $enc==4) && $s==iconv('utf-8','utf-8',$s)) $s = iconv('utf-8','iso-8859-1',$s);
        if ($esc) $s = mysql_real_escape_string($s);
        echo '('.bin2hex(substr($s,0,3)).') ';
        $q = "SELECT * FROM restarant_details WHERE name LIKE '%$s%'";
        $r = mysql_query($q);
        echo mysql_num_rows($r).', ';
    }
}
exit;
If you still get all zeroes here, I give up :)
HarPaddy
Forum Commoner
Posts: 40
Joined: Tue Feb 03, 2009 12:54 am

Re: Special characters Search Problem

Post by HarPaddy »

Hi Appollo,

Thanks for reply.

Please gone through the output once.

without escaping: 0
(c3806e) 1, (c3a06e) 0, (c06e67) 0, (e06e67) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
with escaping: 0
(c3806e) 1, (c3a06e) 0, (c06e67) 0, (e06e67) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
without escaping: 1
(c3806e) 0, (c3a06e) 0, (c3806e) 0, (c3a06e) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
with escaping: 1
(c3806e) 0, (c3a06e) 0, (c3806e) 0, (c3a06e) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
without escaping: 2
(c06e67) 0, (e06e67) 0, (c06e67) 0, (e06e67) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
with escaping: 2
(c06e67) 0, (e06e67) 0, (c06e67) 0, (e06e67) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
without escaping: 3
(c3806e) 1, (c3a06e) 0, (c3806e) 1, (c3a06e) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
with escaping: 3
(c3806e) 1, (c3a06e) 0, (c3806e) 1, (c3a06e) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
without escaping: 4
(c06e67) 0, (e06e67) 0, (c06e67) 0, (e06e67) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,
with escaping: 4
(c06e67) 0, (e06e67) 0, (c06e67) 0, (e06e67) 0, (414e47) 0, (616e67) 0, (4e4745) 1, (6e6765) 1,



I got '0' fro some search word like all the lower case search words.

Thanks,
HarPaddy
User avatar
Apollo
Forum Regular
Posts: 794
Joined: Wed Apr 30, 2008 2:34 am

Re: Special characters Search Problem

Post by Apollo »

Great, it doesn't do what you need but I think I got it :)

The following will probably perform better?

Code: Select all

$search = array(
 chr(0xC3).chr(0x80).'ngel', // 'Àngel' in utf-8 encoding
 chr(0xC3).chr(0xA0).'ngel', // 'àngel' in utf-8 encoding
 chr(0xC0).'ngel', // 'Àngel' in ansi encoding
 chr(0xE0).'ngel', // 'àngel' in ansi encoding
 'ANGEL', 'angel', 'NGEL', 'ngel' );
 
mysql_query("SET NAMES 'latin1' COLLATE 'latin1_german1_ci'");
foreach($search as $s)
{
  if ($s==iconv('utf-8','utf-8',$s)) $s = iconv('utf-8','windows-1252',$s);
  echo '('.bin2hex(substr($s,0,3)).') ';
  $q = "SELECT * FROM restarant_details WHERE name LIKE '%$s%'";
  $r = mysql_query($q);
  echo mysql_num_rows($r).', ';
}
exit;
Post Reply