Page 2 of 2

Re: PHP, MySQL, UTF-8

Posted: Mon Aug 18, 2008 7:59 am
by sirfragalot
Hi,

Just go it working. It seems that the DELIMTER statement is only for the MySQL commandline. I now simply have this ($this->db->query() is my code):

Code: Select all

        $this->db->query('CREATE FUNCTION is_invalid_utf8(s BLOB) 
        RETURNS int 
        DETERMINISTIC 
              BEGIN 
                DECLARE i INT DEFAULT 1; 
                DECLARE c INT; 
                DECLARE num_expected_highs INT DEFAULT 0; 
                WHILE (i <= length(s)) DO 
                  SET c = ord(mid(s, i, 1)); 
                  IF num_expected_highs > 0 THEN 
                    IF (c & 192)<>128 THEN  
                      RETURN 1; 
                    END IF; 
                    SET num_expected_highs = num_expected_highs - 1; 
                  ELSE 
                    IF (c & 224) = 192 THEN  
                      SET num_expected_highs = 1; 
                    ELSEIF (c & 240) = 224 THEN  
                      SET num_expected_highs = 2; 
                    ELSEIF (c & 248) = 240 THEN  
                      SET num_expected_highs = 3; 
                    ELSEIF (c & 128 = 128) THEN 
                      RETURN 1; 
                    END IF; 
                  END IF; 
                  SET i = (i+1); 
                END WHILE; 
                RETURN num_expected_highs; 
              END');
and that works fine.

I've never used a stored procedure before. Is it normally so slow? Testing with and without there's a difference of 0.6 seconds (a total PHP script run of 0.2 to 0.8 seconds). Other than that, I'm pretty satisfied that the problem I started off with on this thread has now been solved. Thank you very, very much for your help and patience.

Re: PHP, MySQL, UTF-8

Posted: Mon Aug 18, 2008 8:56 am
by dml
Great! I've never used stored procedures in production myself, so I don't know very much about their performance. This one loops through every byte in every text field, so it wouldn't be surprising if it ran slowly. Now that I think about it, the stored function doesn't do anything that a regex couldn't do better. How about converting the php regex you had in your first post into mysql format, and getting mysql to do a match against it - select * where field regexp "...".

Re: PHP, MySQL, UTF-8

Posted: Mon Aug 18, 2008 9:29 am
by sirfragalot
I'll give that a try although currently

Code: Select all

SELECT `text` FROM `WKX_resource_note` WHERE `text` REGEXP 
'%(\\?:
[\xC2-\xDF][\x80-\xBF]
|\xE0[\xA0-\xBF][\x80-\xBF]
|[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}
|\xED[\x80-\x9F][\x80-\xBF]
|\xF0[\x90-\xBF][\x80-\xBF]{2}
|[\xF1-\xF3][\x80-\xBF]{3}
|\xF4[\x80-\x8F][\x80-\xBF]{2}
)+%xs'
does not produce the expected result (i.e. it doesn't find what your stored procedure finds). I'm not familiar enough with multibyte character coding to say whether the above is correct.

Mark