UFT-8

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
abalfazl
Forum Commoner
Posts: 71
Joined: Mon Sep 05, 2005 10:05 pm

UFT-8

Post by abalfazl »

UTF-8 uses two bytes.

I want to know what is the purpose and mission of each bytes?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: UFT-8

Post by Eran »

abalfazl
Forum Commoner
Posts: 71
Joined: Mon Sep 05, 2005 10:05 pm

Re: UFT-8

Post by abalfazl »

For any character equal to or below 127 (hex 0x7F), the UTF-8 representation is one byte. It is just the lowest 7 bits of the full unicode value. This is also the same as the ASCII value.

For characters equal to or below 2047 (hex 0x07FF), the UTF-8 representation is spread across two bytes. The first byte will have the two high bits set and the third bit clear (i.e. 0xC2 to 0xDF). The second byte will have the top bit set and the second bit clear (i.e. 0x80 to 0xBF).
What doesit mean by high bits?top bit? bit clear?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: UFT-8

Post by Eran »

top two bits - bits #1 and #2
It says that for characters equal to or below 2047, you need 1 byte and 2 bits from the second byte to represent it fully.
User avatar
DigitalMind
Forum Contributor
Posts: 152
Joined: Mon Sep 27, 2010 2:27 am
Location: Ukraine, Kharkov

Re: UFT-8

Post by DigitalMind »

abalfazl wrote:UTF-8 uses two bytes.
UTF-8 is a variable-length character encoding for Unicode.
UTF-8 encodes each character (code point) in 1 to 4 octets (8-bit bytes). The first 128 characters of the Unicode character set (which correspond directly to the ASCII) use a single octet with the same binary value as in ASCII...
http://en.wikipedia.org/wiki/UTF-8
Post Reply