Random 12 char codes

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Random 12 char codes

Post by miro_igov »

Hi guys, long time not posted here.

I'm trying to create a script which produces large amount of 12 char unique codes. The chars are in A-Z,0-9 set. To verify that a code is unique script saves the generated code in MySQL table (the code is UNIQUE index and if mysql_affected_rows <= 0 attempts to generate new code). Unfortunately this process is extremely slow when the DB table is filled with more than 2 000 000 codes, it is hard to generate unique code which does not exists already in the table. The purpose of the script is to generate about 20 000 000 unique codes and permutation calculation shows that it is possible -
36! / (36-12)! = 599 555 620 984 320 000 uniques.

Is there some good approach in doing that task? Php randomization seems very bad and it often hits codes that are already in the db.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Random 12 char codes

Post by onion2k »

20,000,000 12 character strings would only take up 228MB of RAM.. I'd put PHP's memory limit up and put them in an array. That way you can write a much more efficient search in order to check they're unique. When they're all generated you can feed them into the database using extended inserts.
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Re: Random 12 char codes

Post by miro_igov »

Thank you. If you mean checking for uniques with in_array this method is even worse, it was very slow and stopped after 57 000 codes. No error was reported (i use ini_set('display_errors',1) and other errors are reported fine). With DB uniqueness verification it takes 26 seconds for 100 000 codes.

Any better way?
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Re: Random 12 char codes

Post by hawleyjr »

Is it slow making queries to the database to see if the string is taken or is it slow creating a 'random' string (I use the word random loosely) string?
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Re: Random 12 char codes

Post by miro_igov »

Nothing of these is slow - queries are fast, string generation too. The slowness comes because randomization process often returns strings already in the database and the step is repeated until a unique string is returned. So when the db entries are over 1 000 000 the chance of returning string that is already in the db is too high :(
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Re: Random 12 char codes

Post by hawleyjr »

How are your indexes setup in your table? Please post your table information.
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Re: Random 12 char codes

Post by miro_igov »

Thank you for the interest, here is the table:

Code: Select all

CREATE TABLE `codes` (
  `id` bigint(20) unsigned NOT NULL auto_increment,
  `code` varchar(20) NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `unique_code` (`code`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=1542737 ;
 
I use INSERT query for new code insertion and if mysql_affected_rows is less than 1 i consider it duplicate.
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Re: Random 12 char codes

Post by hawleyjr »

Have you tried an Insert If statement or a ON DUPLICATE KEY UPDATE statement?

Another option is to insert more than one row at a time using ON DUPLICATE KEY UPDATE

From http://dev.mysql.com/doc/refman/5.0/en/ ... icate.html

Code: Select all

 INSERT INTO table (a,b,c,d,e) VALUES (1,2,3,4,5), (6,7,8,9,10) ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c), d=VALUES(d), e=VALUES(e);INSERT INTO table (a,b,c,d,e) VALUES (1,2,3,4,5) ON DUPLICATE KEY UPDATE VALUES(b,c,d,e); 
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Re: Random 12 char codes

Post by miro_igov »

Sorry hawleyjr, my issue is not the DB query at all. I do not see how ON DUPLICATE KEY UPDATE may help, i do not want any duplicate entries in the database, so if there is already such string i must not save it in the db but generate another one, test if it exists in the DB and if no save, if yes generate new string, etc.

The problem is that when the strings are too much in the database it is hard to pick an unique 12chars string that does not exist already, most of the strings are already there.

Ok, let's explain the task - i hope it will clarify the situation. I want to generate 20 000 000 unique 12char strings, they will be printed on scratch cards and entered on a web site.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Random 12 char codes

Post by onion2k »

miro_igov wrote:Thank you. If you mean checking for uniques with in_array this method is even worse, it was very slow and stopped after 57 000 codes. No error was reported (i use ini_set('display_errors',1) and other errors are reported fine). With DB uniqueness verification it takes 26 seconds for 100 000 codes.

Any better way?
57,000? Weird.. my quick test code did a million in a little over 30 seconds.

I think it's your database inserts that's slowing you down.

Code: Select all

<?php
 
    $start = microtime_float();
    register_shutdown_function("finished", $start);
 
    $codes = array();
 
    for ($x = 0; $x < 1000000; $x++) {
 
        $code = genCode();
        if ($codes[$code]>0) { next; }
        $codes[$code]++;
 
    }
 
    echo count($codes)."<br>";
 
    function genCode() {
 
        static $ac = array("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","0","1","2","3","4","5","6","7","8","9");
        static $l = 36; //Number of characters
 
        $code = "";
 
        for ($x = 0; $x < 12; $x++) {
 
            $code .= $ac[rand(0,$l)];
 
        }
 
        return $code;
 
    }
 
    function finished($start) {
 
        $finish = microtime_float();
        echo $finish-$start;
 
    }
 
    function microtime_float() {
        list($usec, $sec) = explode(" ", microtime());
        return ((float)$usec + (float)$sec);
    }
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Re: Random 12 char codes

Post by hawleyjr »

I see.

What is the code that you are using to create the 'random' strings?
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Re: Random 12 char codes

Post by hawleyjr »

Onion,

Good use of register_shutdown_function( );

I rarely see developers using that function call.
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Re: Random 12 char codes

Post by miro_igov »

Code: Select all

function generate_code($length) {
 
    $allowed_chars = '';
    
    for($i=0;$i<=9;$i++) $allowed_chars .= $i.',';
    
    for($i='A';( $i<='Z' && strlen($i)<=1);$i++) $allowed_chars .= $i.',';
    
    $allowed_chars = rtrim($allowed_chars,',');
    
    
    $ac_arr = explode(',',$allowed_chars);
    
    $ac_count = count($ac_arr);
 
    
    $code = '';
    
 
    for($i=0;$i<$length;$i++) {
        $index = mt_rand(0,$ac_count-1);
        
        $code .= $ac_arr[$index];
    }
    
    return $code;
}
User avatar
anjanesh
DevNet Resident
Posts: 1679
Joined: Sat Dec 06, 2003 9:52 pm
Location: Mumbai, India

Re: Random 12 char codes

Post by anjanesh »

Apparently pure SQL is slower. (Didnt cross-check with a script though, based on OP's 26secs for the 1st 100,000 inserts).

Code: Select all

DROP PROCEDURE IF EXISTS `generateCodes`;
DELIMITER $$$
CREATE PROCEDURE generateCodes (IN n INT, IN max INT)
BEGIN
        DECLARE cnt BOOL DEFAULT 0;
        DECLARE newCode CHAR(255);
        DECLARE i, j INT DEFAULT 0;
        DECLARE ascii, r INT;
 
        REPEAT
            REPEAT
                  SET newCode = '', j = 0;
                  WHILE j < n DO
                       SET r = FLOOR(RAND() * 35); -- 10 + 26 => 0:35
                       SET ascii = r + IF(r < 10, 48, 55);
                       SET newCode = CONCAT(newCode, CHAR(ascii));
                       SET j = j + 1;
                  END WHILE;
                  SELECT COUNT(`id`) INTO cnt FROM `codes` WHERE `code` = newCode;
            UNTIL cnt = 0 END REPEAT;
            INSERT DELAYED INTO `codes` VALUES('', newCode);
            SET i = i + 1;
        UNTIL i = max END REPEAT;
END $$$
DELIMITER ;
This is the result of adding 10 x 100,000 codes.

Code: Select all

mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (51.14 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (47.70 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (49.81 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (55.55 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (59.75 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 3.58 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 1.17 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 2.50 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 2.94 sec)
 
mysql> CALL generateCodes(12, 100000);
Query OK, 1 row affected, 65535 warnings (1 min 2.94 sec)
 
mysql>
Even though it shows 65535 warnings (where does it get logged anyway ?), 100,000 rows got inserted each time.

miro_igov - how much time does this take for you ?

Can this procedure be optimized further ?
miro_igov
Forum Contributor
Posts: 485
Joined: Fri Mar 31, 2006 5:06 am
Location: Bulgaria

Re: Random 12 char codes

Post by miro_igov »

Thanks anjanesh, with few 100,000 codes it takes from 21 to 26 seconds per 100,000 when i use the mysql_query() and INSERT query. I prefer to use PHP to handle this instead of MySQL procedure. This time increases a lot when i have more than 2,000,000 codes already in the database. Now i'm testing the o2k's code, it seems quicker in generating unique codes in array, i like how he uses the code as array key, much better than testing with in_arrray. It will produce notices on that line if ($codes[$code]>0) { next; } but is workaround. I do not care how long it will take to insert these in mysql, much important is to generate lot of unique values quickly.
Post Reply