Page 2 of 2

Re: php & Unicode 5.1 characters

Posted: Thu Apr 02, 2009 11:46 am
by Apollo
Chris Corbyn wrote:Yours will be a lot slower for larger strings BTW due to the repeated ord() usage. Some of the verbosity of mine is because it needs to be fast (it's part of Swift Mailer).
You sure that ord( $s[$i] ) is slower than $_byteMap[ $s[$i] ] ? I'd say the latter involves 128 string comparisons on average for every char lookup (since it's using each char from $s as a key in a string-indexed array with 256 elements). I have no clue about how well PHP's ord() function is implemented, but it surely wouldn't make sense to me if it took significantly more time.

Either way, I'd say the whole conversion altogether shouldn't take much time anyway (few seconds tops?) for very long strings (as in, several megabytes) on any random cheap-ass machine.

BTW it might be interesting to see if your method gets faster when you move commonly used characters (0x65 or 'e' is probably the most frequent in common textual data) to the beginning of the $_byteMap array :) (or would PHP recognize that the keys in your array are sorted, and perform some binary search?)

Re: php & Unicode 5.1 characters

Posted: Thu Apr 02, 2009 6:08 pm
by Chris Corbyn
You're right, I just benchmarked mine against yours and they take the same amount of time. I hadn't thought that accessing values in an array would require a search through the array (surely it has to be more efficient than that? :()

It would be interesting to move the vowels, " ", "\n" and "\r" to the start of the map yes.

To be honest, there's one horribly slow part in my code that I need to get rid of:

Code: Select all

while (list(,$b) = each($bytes)) {
  ...
}
I did that because the byte array is passed in by-reference and iterating must pick up where it last left off (i.e. the array pointer must not be reset). I think tracking the key position and using a for loop will speed that up though:

http://www.phpbench.com/