viewtopic.php?p=376608#376608
Quick summary of QP encoding:
In a UTF-8 string, the number of bytes per-character can vary, but QP encoding simply turns each individual byte into =XX where XX is the hexadecimal value. Lines cannot exceed 76 characters in QP encoding and end with "=" followed by CRLF if the line has to be chopped. An "=" followed by CRLF is simply disregarded in the decoded output so that the string appears as it was.
Because of the line-length limit I need to break the line and add the "=" to the end (known as a soft break) however, unknown to me until now, you cannot split an individual character across multiple lines, so how can I determine what are whole characters in a multibyte string?
I cannot use the mutlibyte functions by the way since this has to work on a vanilla PHP installation.
Hopefully someone can shed load light on understanding the structure of UTF-8?
EDIT | Where's Ambush Commander when ya need him?
EDIT | w00t! I found Ambush Commander's article I seemed to remember about and have been following links from that. I'm less in the dark than I was but I still do not know how to figure out what is a character and what is just a byte from a multibyte character. I'll explain further what I do:
I currently do an extremely basic:
Code: Select all
for ($i =0; $ < strlen($string); $i++)
{
$ord = ord($string{$i});
//Check if it's a permitted byte, then either append it to $result, or sprintf("=%02X", $ord);
}Code: Select all
while ($char = mb_substr($string, 0, 1))
{
//I have a character, but it could be any number of bytes
$string = mb_substr($string, 1); //Move along the string
}