Page 1 of 1

get number of bytes ??

Posted: Mon Aug 16, 2010 3:08 pm
by ddragas
I'm in BIG doubt which one is correct

from mysql:

Code: Select all

SELECT (BIT_LENGTH('some message čćžÅ*Đ')*0.125) as bytes,

returns 34 bytes

where some message čćžÅ*Đ = some message č枊Đ

from PHP

Code: Select all


$str = "some message č枊Đ";

mb_strlen(utf8_encode($str), 'latin1');
returns 33 bytes


from perl

Code: Select all


$str = "some message č枊Đ";

my $Text = encode('UTF-16', $str);

## get number of bytes
{
        use bytes;
        $byte_size = length($Text);
}	

returns 38 bytes


which one is correct ? :confused:

Re: get number of bytes ??

Posted: Mon Aug 16, 2010 7:39 pm
by requinix
They're all correct.
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_bit-length wrote:BIT_LENGTH(str)

Returns the length of the string str in bits.
Divide by 8 to get the number of bytes.
http://php.net/mb-strlen wrote:Return Values

Returns the number of characters in string str having character encoding encoding. A multi-byte character is counted as 1.
Counts characters, not bytes.
http://perldoc.perl.org/functions/length.html wrote:Like all Perl character operations, length() normally deals in logical characters, not physical bytes.
By default counts characters, not bytes.


Also,

Code: Select all

mb_strlen(utf8_encode($str), 'latin1');
How do you expect to get the right answer if you count a string in one encoding as if it was in another?

Code: Select all

encode('UTF-16', $str);
use bytes;
You encoded the string in UTF-16 (which uses more bytes than UTF-8 does) and explictly told Perl to count bytes, not characters.


In all three you use a different function on a different string. Of course the results are different!