Text in utf8
Posted: Tue Jul 05, 2011 9:50 am
I'm having difficulty working with utf8 strings.
Using $line{$i} to work through a string read from a utf8 file seem to give one byte at a time, which is not useful when I want to detect characters which may cover several bytes.
I looked for something which talked about finding characters in strings and came up with mb_substr, but it doesn't seem to be available on the server I use (and those who would know about such things are on holiday). Is mb_substr what I need for utf8, or is it designed for some else entirely? Is there some other way?
Apart from that, here's a specific problem: I'm trying to handle the bom as follows:
but $bom is echoed as an empty string and the three bytes are each shown as zero. Looking at the file with an editor, it starts with hex EF,BB,BF as expected. Why is php doing this?
Using $line{$i} to work through a string read from a utf8 file seem to give one byte at a time, which is not useful when I want to detect characters which may cover several bytes.
I looked for something which talked about finding characters in strings and came up with mb_substr, but it doesn't seem to be available on the server I use (and those who would know about such things are on holiday). Is mb_substr what I need for utf8, or is it designed for some else entirely? Is there some other way?
Apart from that, here's a specific problem: I'm trying to handle the bom as follows:
Code: Select all
$infile = "input.txt";
$lines = file ($infile);
foreach ($lines as $line_num => $line)
{
if ($line_num == 0)
{
$bom = substr ($lines,0,3); //  [EDIT: change $lines to $line and all is well]
echo "bom is ".$bom."/".ord($bom{0})."/".ord($bom{1})."/".ord($bom{2})."/".chr(13).chr(10);
...