Text in utf8

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
giomach
Forum Newbie
Posts: 18
Joined: Wed Jun 29, 2011 6:52 pm

Text in utf8

Post by giomach »

I'm having difficulty working with utf8 strings.

Using $line{$i} to work through a string read from a utf8 file seem to give one byte at a time, which is not useful when I want to detect characters which may cover several bytes.

I looked for something which talked about finding characters in strings and came up with mb_substr, but it doesn't seem to be available on the server I use (and those who would know about such things are on holiday). Is mb_substr what I need for utf8, or is it designed for some else entirely? Is there some other way?

Apart from that, here's a specific problem: I'm trying to handle the bom as follows:

Code: Select all

$infile = "input.txt";
$lines = file ($infile);
foreach ($lines as $line_num => $line)
{
     if ($line_num == 0)
      {
      $bom = substr ($lines,0,3); //    [EDIT: change $lines to $line and all is well]
      echo "bom is ".$bom."/".ord($bom{0})."/".ord($bom{1})."/".ord($bom{2})."/".chr(13).chr(10);
      ...
but $bom is echoed as an empty string and the three bytes are each shown as zero. Looking at the file with an editor, it starts with hex EF,BB,BF as expected. Why is php doing this?
User avatar
flying_circus
Forum Regular
Posts: 732
Joined: Wed Mar 05, 2008 10:23 pm
Location: Sunriver, OR

Re: Text in utf8

Post by flying_circus »

giomach wrote:[EDIT: change $lines to $line and all is well]
I was in the middle of a reply stating just the same thing :)
Post Reply