Page 1 of 1

UFT-8

Posted: Sat Oct 02, 2010 1:23 am
by abalfazl
UTF-8 uses two bytes.

I want to know what is the purpose and mission of each bytes?

Re: UFT-8

Posted: Sat Oct 02, 2010 6:19 am
by Eran

Re: UFT-8

Posted: Sat Oct 02, 2010 9:15 am
by abalfazl
For any character equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. It is just the lowest 7 bits of the full unicode value. This is also the same as the ASCII value.

For characters equal to or below 2047 (hex 0x07FF), the UTF-8 representation is spread across two bytes. The first byte will have the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The second byte will have the top bit set and the second bit clear (i.e. 0x80 to 0xBF).
What doesit mean by high bits?top bit? bit clear?

Re: UFT-8

Posted: Sat Oct 02, 2010 2:25 pm
by Eran
top two bits - bits #1 and #2
It says that for characters equal to or below 2047, you need 1 byte and 2 bits from the second byte to represent it fully.

Re: UFT-8

Posted: Sat Oct 02, 2010 3:22 pm
by DigitalMind
abalfazl wrote:UTF-8 uses two bytes.
UTF-8 is a variable-length character encoding for Unicode.
UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes). The first 128 characters of the Unicode character set (which correspond directly to the ASCII) use a single octet with the same binary value as in ASCII...
http://en.wikipedia.org/wiki/UTF-8