Using $line{$i} to work through a string read from a utf8 file seem to give one byte at a time, which is not useful when I want to detect characters which may cover several bytes.
I looked for something which talked about finding characters in strings and came up with mb_substr, but it doesn't seem to be available on the server I use (and those who would know about such things are on holiday). Is mb_substr what I need for utf8, or is it designed for some else entirely? Is there some other way?
Apart from that, here's a specific problem: I'm trying to handle the bom as follows:
Code: Select all
$infile = "input.txt";
$lines = file ($infile);
foreach ($lines as $line_num => $line)
{
if ($line_num == 0)
{
$bom = substr ($lines,0,3); //  [EDIT: change $lines to $line and all is well]
echo "bom is ".$bom."/".ord($bom{0})."/".ord($bom{1})."/".ord($bom{2})."/".chr(13).chr(10);
...