Page 1 of 2

Random 12 char codes

Posted: Thu Mar 13, 2008 2:33 am
by miro_igov
Hi guys, long time not posted here.

I'm trying to create a script which produces large amount of 12 char unique codes. The chars are in A-Z,0-9 set. To verify that a code is unique script saves the generated code in MySQL table (the code is UNIQUE index and if mysql_affected_rows <= 0 attempts to generate new code). Unfortunately this process is extremely slow when the DB table is filled with more than 2 000 000 codes, it is hard to generate unique code which does not exists already in the table. The purpose of the script is to generate about 20 000 000 unique codes and permutation calculation shows that it is possible -
36! / (36-12)! = 599 555 620 984 320 000 uniques.

Is there some good approach in doing that task? Php randomization seems very bad and it often hits codes that are already in the db.

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 4:02 am
by onion2k
20,000,000 12 character strings would only take up 228MB of RAM.. I'd put PHP's memory limit up and put them in an array. That way you can write a much more efficient search in order to check they're unique. When they're all generated you can feed them into the database using extended inserts.

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 5:14 am
by miro_igov
Thank you. If you mean checking for uniques with in_array this method is even worse, it was very slow and stopped after 57 000 codes. No error was reported (i use ini_set('display_errors',1) and other errors are reported fine). With DB uniqueness verification it takes 26 seconds for 100 000 codes.

Any better way?

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 9:38 am
by hawleyjr
Is it slow making queries to the database to see if the string is taken or is it slow creating a 'random' string (I use the word random loosely) string?

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 10:01 am
by miro_igov
Nothing of these is slow - queries are fast, string generation too. The slowness comes because randomization process often returns strings already in the database and the step is repeated until a unique string is returned. So when the db entries are over 1 000 000 the chance of returning string that is already in the db is too high :(

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 10:12 am
by hawleyjr
How are your indexes setup in your table? Please post your table information.

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 10:19 am
by miro_igov
Thank you for the interest, here is the table:

Code: Select all

CREATE TABLE `codes` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `code` varchar(20) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `unique_code` (`code`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1542737 ;
 
I use INSERT query for new code insertion and if mysql_affected_rows is less than 1 i consider it duplicate.

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 10:47 am
by hawleyjr
Have you tried an Insert If statement or a ON DUPLICATE KEY UPDATE statement?

Another option is to insert more than one row at a time using ON DUPLICATE KEY UPDATE

From http://dev.mysql.com/doc/refman/5.0/en/ ... icate.html

Code: Select all

 INSERT INTO table (a,b,c,d,e) VALUES (1,2,3,4,5), (6,7,8,9,10) ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c), d=VALUES(d), e=VALUES(e);INSERT INTO table (a,b,c,d,e) VALUES (1,2,3,4,5) ON DUPLICATE KEY UPDATE VALUES(b,c,d,e); 

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 12:19 pm
by miro_igov
Sorry hawleyjr, my issue is not the DB query at all. I do not see how ON DUPLICATE KEY UPDATE may help, i do not want any duplicate entries in the database, so if there is already such string i must not save it in the db but generate another one, test if it exists in the DB and if no save, if yes generate new string, etc.

The problem is that when the strings are too much in the database it is hard to pick an unique 12chars string that does not exist already, most of the strings are already there.

Ok, let's explain the task - i hope it will clarify the situation. I want to generate 20 000 000 unique 12char strings, they will be printed on scratch cards and entered on a web site.

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 12:37 pm
by onion2k
miro_igov wrote:Thank you. If you mean checking for uniques with in_array this method is even worse, it was very slow and stopped after 57 000 codes. No error was reported (i use ini_set('display_errors',1) and other errors are reported fine). With DB uniqueness verification it takes 26 seconds for 100 000 codes.

Any better way?
57,000? Weird.. my quick test code did a million in a little over 30 seconds.

I think it's your database inserts that's slowing you down.

Code: Select all

<?php
 
    $start = microtime_float();
    register_shutdown_function("finished", $start);
 
    $codes = array();
 
    for ($x = 0; $x < 1000000; $x++) {
 
        $code = genCode();
        if ($codes[$code]>0) { next; }
        $codes[$code]++;
 
    }
 
    echo count($codes)."<br>";
 
    function genCode() {
 
        static $ac = array("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","0","1","2","3","4","5","6","7","8","9");
        static $l = 36; //Number of characters
 
        $code = "";
 
        for ($x = 0; $x < 12; $x++) {
 
            $code .= $ac[rand(0,$l)];
 
        }
 
        return $code;
 
    }
 
    function finished($start) {
 
        $finish = microtime_float();
        echo $finish-$start;
 
    }
 
    function microtime_float() {
        list($usec, $sec) = explode(" ", microtime());
        return ((float)$usec + (float)$sec);
    }

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 12:38 pm
by hawleyjr
I see.

What is the code that you are using to create the 'random' strings?

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 12:40 pm
by hawleyjr
Onion,

Good use of register_shutdown_function( );

I rarely see developers using that function call.

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 12:48 pm
by miro_igov

Code: Select all

function generate_code($length) {
 
    $allowed_chars = '';
    
    for($i=0;$i<=9;$i++) $allowed_chars .= $i.',';
    
    for($i='A';( $i<='Z' && strlen($i)<=1);$i++) $allowed_chars .= $i.',';
    
    $allowed_chars = rtrim($allowed_chars,',');
    
    
    $ac_arr = explode(',',$allowed_chars);
    
    $ac_count = count($ac_arr);
 
    
    $code = '';
    
 
    for($i=0;$i<$length;$i++) {
        $index = mt_rand(0,$ac_count-1);
        
        $code .= $ac_arr[$index];
    }
    
    return $code;
}

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 1:05 pm
by anjanesh
Apparently pure SQL is slower. (Didnt cross-check with a script though, based on OP's 26secs for the 1st 100,000 inserts).

Code: Select all

DROP PROCEDURE IF EXISTS `generateCodes`;
DELIMITER $$$
CREATE PROCEDURE generateCodes (IN n INT, IN max INT)
BEGIN
        DECLARE cnt BOOL DEFAULT 0;
        DECLARE newCode CHAR(255);
        DECLARE i, j INT DEFAULT 0;
        DECLARE ascii, r INT;
 
        REPEAT
            REPEAT
                  SET newCode = '', j = 0;
                  WHILE j < n DO
                       SET r = FLOOR(RAND() * 35); -- 10 + 26 => 0:35
                       SET ascii = r + IF(r < 10, 48, 55);
                       SET newCode = CONCAT(newCode, CHAR(ascii));
                       SET j = j + 1;
                  END WHILE;
                  SELECT COUNT(`id`) INTO cnt FROM `codes` WHERE `code` = newCode;
            UNTIL cnt = 0 END REPEAT;
            INSERT DELAYED INTO `codes` VALUES('', newCode);
            SET i = i + 1;
        UNTIL i = max END REPEAT;
END $$$
DELIMITER ;
This is the result of adding 10 x 100,000 codes.

Code: Select all

mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (51.14 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (47.70 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (49.81 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (55.55 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (59.75 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 3.58 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 1.17 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 2.50 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 2.94 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 2.94 sec)
 
mysql>
Even though it shows 65535 warnings (where does it get logged anyway ?), 100,000 rows got inserted each time.

miro_igov - how much time does this take for you ?

Can this procedure be optimized further ?

Re: Random 12 char codes

Posted: Thu Mar 13, 2008 1:20 pm
by miro_igov
Thanks anjanesh, with few 100,000 codes it takes from 21 to 26 seconds per 100,000 when i use the mysql_query() and INSERT query. I prefer to use PHP to handle this instead of MySQL procedure. This time increases a lot when i have more than 2,000,000 codes already in the database. Now i'm testing the o2k's code, it seems quicker in generating unique codes in array, i like how he uses the code as array key, much better than testing with in_arrray. It will produce notices on that line if ($codes[$code]>0) { next; } but is workaround. I do not care how long it will take to insert these in mysql, much important is to generate lot of unique values quickly.