Page 1 of 1

Text in utf8

Posted: Tue Jul 05, 2011 9:50 am
by giomach
I'm having difficulty working with utf8 strings.

Using $line{$i} to work through a string read from a utf8 file seem to give one byte at a time, which is not useful when I want to detect characters which may cover several bytes.

I looked for something which talked about finding characters in strings and came up with mb_substr, but it doesn't seem to be available on the server I use (and those who would know about such things are on holiday). Is mb_substr what I need for utf8, or is it designed for some else entirely? Is there some other way?

Apart from that, here's a specific problem: I'm trying to handle the bom as follows:

Code: Select all

$infile = "input.txt";
$lines = file ($infile);
foreach ($lines as $line_num => $line)
{
     if ($line_num == 0)
      {
      $bom = substr ($lines,0,3); //    [EDIT: change $lines to $line and all is well]
      echo "bom is ".$bom."/".ord($bom{0})."/".ord($bom{1})."/".ord($bom{2})."/".chr(13).chr(10);
      ...
but $bom is echoed as an empty string and the three bytes are each shown as zero. Looking at the file with an editor, it starts with hex EF,BB,BF as expected. Why is php doing this?

Re: Text in utf8

Posted: Tue Jul 05, 2011 1:55 pm
by flying_circus
giomach wrote:[EDIT: change $lines to $line and all is well]
I was in the middle of a reply stating just the same thing :)