Processing utf-8 string
Posted: Wed Dec 07, 2011 4:38 am
I'd like to ask a general question about the efficiency of processing a utf-8 string, where each character in the string has to be processed in turn.
Am I right in assuming that
mb_substr($s, $i, 1, 'UTF-8')
can only get to the $ith character by working through from the start of $s?
Whereas
mb_substr($s, $i, 1, 'UTF-16')
will calculate where the bits for the $ith character are, and go directly to them?
If so, then it would seem better to make a once-off mb_convert_encoding of the string from utf-8 to utf-16, before repeatedly examining individual characters?
Is that reasoning valid, or am I missing something?
Thanks for advice.
Am I right in assuming that
mb_substr($s, $i, 1, 'UTF-8')
can only get to the $ith character by working through from the start of $s?
Whereas
mb_substr($s, $i, 1, 'UTF-16')
will calculate where the bits for the $ith character are, and go directly to them?
If so, then it would seem better to make a once-off mb_convert_encoding of the string from utf-8 to utf-16, before repeatedly examining individual characters?
Is that reasoning valid, or am I missing something?
Thanks for advice.