Page 1 of 1

Storing language specific chars

Posted: Wed Oct 12, 2005 3:21 am
by Ree
I have a problem storing language specific chars in MySQL even with utf8_unicode_ci on db/tables/fields.

Here's an example. I have entered 'дфрывафрв' value in 'headline' field via HTML form. When I retrieve the field's value using PHP and display it on some HTML page, it displays correctly. But when I checked the same field value in phpMyAdmin, it looked terrible, like this:

Code: Select all

дфрывафрв
I have tried running the following query:

Code: Select all

SELECT * FROM news WHERE headline='дфрывафрв'
And of course it did not work (no records found). So, that means I am unable to do searches on db when storing records with language specific chars.

Anyone could explain me how to solve the problem? I thought utf8_unicode_ci would make it all fine.

Posted: Wed Oct 12, 2005 7:03 am
by feyd
use blob types maybe.

Posted: Wed Oct 12, 2005 8:06 am
by Ree
Well, this seems to work but that doesn't solve another problem: I cannot use my string truncating function. It takes string and character count as arguments and returns truncated string with length <= character count without chopping parts of words (I use it with news items - it allows me to display a part of news item).

Maybe converting each char to html equivalent before storing in db (you know those &#int;)? But that's awkward...

There must be a way to store chars normally...

Posted: Wed Oct 12, 2005 8:15 am
by feyd
If you store the data in UTF-8, then the multibyte string systems in PHP can handle it..

Posted: Wed Oct 12, 2005 8:42 am
by Ree
As I mentioned before, everything is stored in UTF-8, but that does not allow me to truncate strings.

Code: Select all

function truncate($str, $chars)
{
  $str = substr($str, 0, $chars + 1);
  $length = strlen($str);
  for ($i = $length - 1; $i > 0; $i--)
  {
    if (substr($str, $i, 1) == ' ')
    {
      $check = $i;
      break;
    }
  }
  if (isset($check))
  {
    $str = substr($str, 0, $check + 1) . '...';
  } else
  {
    $str = '';
  }  
  return $str;
}

$str = '&#261;e&#281;&#261;&#269;&#279;&#281; &#363;&#371;&#363;&#302;Š&#302;Š&#278; &#303;š&#281;&#303;&#261;&#281; &#261;&#269;&#281;&#281; &#261;&#269;&#281; &#261;&#261; &#261;&#261;&#261; š&#279;š';
echo truncate($str, 40);
You should get this:

Code: Select all

&#261;e&#281;&#261;&#269;&#279;&#281; &#363;&#371;&#363;&#302;Š&#302;Š&#278; &#303;š&#281;&#303;&#261;&#281; &#261;&#269;&#281;&#281; &#261;&#269;&#281; &#261;&#261; &#261;&#261;&#261; ...
But you'll get this:

Code: Select all

&#261;e&#281;&#261;&#269;&#279;&#281; &#363;&#371;&#363;&#302;Š&#302;Š&#278; ...
Still can't find a solution...

Posted: Wed Oct 12, 2005 8:46 am
by feyd
your "truncate" is functioning off of bytes, not characters.

Posted: Wed Oct 12, 2005 8:54 am
by Ree
What should I change then?