Page 1 of 1

utf 8 characters problem

Posted: Sat Oct 17, 2009 6:28 pm
by martinpmf
i'm new to this forum so i'm sorry if this topic is discussed before, i can't find it (i'll ask the administrators to move this post to the right place if it's misplaced )
Anyway i'm trying to read from .txt file witch contains utf 8 encoded strings,each single one in a row, enter, then the others till the end with "enters" between them. Then i'm trying to get the last character from each string and if it's equal to some other character, then i like it to echo that string. But there seems to be a problem in the comparing and my "if" is not working properly. I guess there are same extra invisible characters added because of the encoding. Here is the code:

Code: Select all

 
$tekst=fopen("text.txt","r");
while(!feof($tekst)):
$str=fgets($tekst);
$tmp1=UTF8::strlen($str); //gets the length of a string  
$tmp=UTF8::substr($str, $tmp1-3, $tmp1-2); //gets the last character of the string
if ($tmp == "?"){
    echo $str;
    }
endwhile;
fclose($tekst);
 

When i echo $tmp it shows me "?", but when i compare it in "if" it doesn't work as i want. Any idea why php is not handling the encoding as it should. Are the "enters" problem? Are the bytes in the encoding problem?

Thanks for the answer in advance...

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 6:26 am
by Mark Baker
Not without seeing your UTF8 class, and the methods in it like substr
Why can't you simply use PHP's own functions?

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 8:10 am
by guosheng1987
maybe you can use the function of "iconv"

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 10:21 am
by martinpmf
Because the normal functions of PHP5 don't work correctly for UTF8 as for alphabet :S

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 10:48 am
by Mark Baker
martinpmf wrote:Because the normal functions of PHP5 don't work correctly for UTF8 as for alphabet :S
The correct normal functions do work. e.g. mb_strlen() and mb_substr() or iconv_strlen() and iconv_substr()
We can't see what your UTF8 Class does, so it's very difficult for us to offer any other help.

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 11:02 am
by martinpmf
UTF8 class that i found on the net, works with mb functions, and works for some of the things I need, i can't post it because its 800 lines :s

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 11:34 am
by martinpmf
here i simplify:
<?php
$tekst=fopen("text.txt","r");
while(!feof($tekst)){
$tmp=fgets($tekst);
$dolzina=mb_strlen($tmp);
echo $dolzina.$tmp."<br />";
if ($tmp == "мартин"){
echo "ok"."<br />";
} else {
echo "tapa"."<br />";
}
}
fclose($tekst);
?>
and the output:
17мартин// where 17 is the lenght, and it should be 6
tapa
8martin
tapa
10горги
tapa

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 12:55 pm
by Mark Baker
And what do you get if you use

Code: Select all

$dolzina=mb_strlen($tmp,'UTF-8');

Re: utf 8 characters problem

Posted: Sun Oct 18, 2009 1:22 pm
by martinpmf
thanks Mark,
I just needed to add mb_internal_encoding("UTF-8") on the beginning and all work well :)