I can use mb_substring() to read one character at a time, but I for one do not have mb_* compiled into my PHP installation so I need a simple fallback.
If mb_* is installed, I'll use that, otherwise I need my own implementation of just one tiny little thing: The abililty to scan a string byte-for-byte until I have one complete character.
I need some character set guru to tell me if this will work however (read the comments above the method name):
Code: Select all
/**
* Analyzes characters for a specific character set.
* @package Swift
* @subpackage Encoder
* @author Chris Corbyn
*/
interface Swift_CharacterSetValidator
{
/**
* Returns an integer which specifies how many more bytes to read.
* A positive integer indicates the number of more bytes to fetch before invoking
* this method again.
* A value of zero means this is already a valid character.
* A value of -1 means this cannot possibly be a valid character.
* @param string $partialCharacter
* @return int
*/
public function validateCharacter($partialCharacter);
}If I have a string and I want to split it into an array of characters I'd use this sort of algorithm ($string is a simple stream-like wrapper to make this example simpler).
Code: Select all
$chars = array();
$currentChar = '';
$byteCount = 1;
$pos = 0;
while (0 != strlen($str)) {
//Shift $byteCount bytes off the start of the string
for ($i =0; $i < $byteCount; $i++) {
$currentChar .= substr($string, 0, 1);
$string = substr($string, 1);
}
//See what validator says for number of more bytes to fetch
$byteCount = $validator->validateCharacter($currentChar);
if (-1 == $byteCount) {
//Error
} elseif (0 == $byteCount) {
//This is a valid character in this charset
$chars[] = $currentChar;
$currentChar = '';
$byteCount = 1;
}
}
var_dump($chars);Can anyone offer a faster algorithm than this? Can anyone pick holes in the viability of using this approach?
I'm no character set expert so I'm all ears
NOTE: All I need to be able to do is read a string (or file stream) one character at a time, if I can do that, everything else will fall into place.
EDIT | I'm away at a music festival until Monday so if there's a lack of response before then that's why