Page 1 of 1

utf8_strtolower() Feedback

Posted: Sat Sep 05, 2009 8:19 am
by lkjkorn19
Hi,

I have programmed a sort of utf8_strtolower function. Basically, it will replace any numeric HTML entity (in the form of ä, for example) to a lowercase equivalent if it exists (e.g. Ë (Ë) becomes ë (ë)).

Now, I have mapped all characters manually, I've got about 650 array elements. Following my example with ë, I would assign my array elements as follows:

Code: Select all

$utf8_strtox[203] = 235;
where the entity with the number 203 would be replaced to 235.

The function works and I can really guarantee that all of those non-ASCII characters will be in the form of a numeric HTML entity.

Here's the full code: http://pastebin.com/fbd61bb7

Example of use:

Code: Select all

echo utf8_strtolower('Ê Ш');
// will echo ê ш
I tried some tests, to create the array (with utf8_strtox_init()) it takes about 0.002023 seconds (average time out of 50 times)

Additionally, I created a random string with 250 HTML entities of uppercase characters and it took PHP an average time of 0.002347 seconds (average out of 50 times) to replace them to lowercase entities. So, in theory, replacing a 250 HTML entity-string will take roughly 0.005 seconds.

I'm not very knowledgeable about PHP efficiency and what eats up its memory and intend on implementing these functions in a popular CMS-script. Can someone tell me if there is anything I should be aware of, anything I should change, etc ?

Thank you. :)

Re: utf8_strtolower() Feedback

Posted: Fri Oct 02, 2009 5:52 pm
by pickle
I take it you've done this because strtolower() isn't UTF-8 compatible?

If you have to manually create the matchup (ie: Ë -> ë), I'd do it this way:

Code: Select all

$string = "This is the character: %#203;";
$updated_string = utf8_strtolower($string);
 
function utf8_strtolower($string)
{
  $search = array('%#203;','etc');
  $replace = array('%#235;','etc');
 
  return str_replace($search,$replace,$string);
}
Unless of course I'm missing something.