Page 1 of 1

Update Query Performance

Posted: Mon Feb 21, 2011 4:48 am
by JakeJ
I'm having a big problem with the performance of an update query.

I've got about 135,000 records I need to check, then update if a change is needed. Currently, it's changing about half of the records and takes about 6 minutes per 1000 records to update. If you notice my query, I'm only doing 1000 records at a time (this is for testing purposes).

The query takes about 6 minutes per 1000 records to run (it's an old WinXP laptop) though I'm having the same problem on a MUCH newer dual core HP laptop with twice as much RAM (the supposedly faster laptop is significantly slower it seems, It's in the process of testing but it's taken at least 10 minutes so far for 1000 records).

Here's the code:

Code: Select all

<?php

$bgtime=time();
#MySQL Connection String
#Local connection - Uncomment before using locally
mysql_connect("localhost", "root", "root") or die(mysql_error());
@ mysql_select_db("testing") or die(mysql_error());     

set_time_limit(14000);

$addresspattern[1] = '/ [sS][tT]\Z/';
$addressreplace[1] = ' Street';

$addresspattern[2] = '/ [dD][rR]\Z/';
$addressreplace[2] = ' Drive';

$addresspattern[3] = '/ [sS][tT]\.\Z/';
$addressreplace[3] = ' Street';

$addresspattern[4] = '/ [rR][dD]\Z/';
$addressreplace[4] = ' Road';

$addresspattern[5] = '/ [rR][dD]\.\Z/';
$addressreplace[5] = ' Road';

$addresspattern[6] = '/ [pP][lL] /';
$addressreplace[6] = ' Place ';

$addresspattern[7] = '/ [aA][vV][eE]\Z/';
$addressreplace[7] = ' Avenue';

$addresspattern[8] = '/ [nN]\. /';
$addressreplace[8] = '/ North /';

$addresspattern[9] = '/ [sS]\. /';
$addressreplace[9] = ' South ';

$addresspattern[10] = ' /[eE]\. /';
$addressreplace[10] = ' East ';

$addresspattern[11] = ' /[wW]\. /';
$addressreplace[11] = ' West ';

$addresspattern[12] = '/ [nN] /';
$addressreplace[12] = ' North ';

$addresspattern[13] = '/ [sS] /';
$addressreplace[13] = ' South ';

$addresspattern[14] = '/ [eE] /';
$addressreplace[14] = ' East ';

$addresspattern[15] = '/ [wW] /';
$addressreplace[15] = ' West ';

$addresspattern[16] = '/ [sS][tT] /';
$addressreplace[16] = ' Street ';

$addresspattern[17] = '/ [sS][Tt]\. /';
$addressreplace[17] = ' Street ';

$addresspattern[18] = '/ [sS][wW] /';
$addressreplace[18] = ' Southwest ';

$addresspattern[19] = '/ [lL][nN]( |\Z)/';
$addressreplace[19] = ' Lane ';

$addresspattern[20] = '/\A[sS] /';
$addressreplace[20] = 'South ';

$addresspattern[21] = '/\A[nN] /';
$addressreplace[21] = 'North ';

$addresspattern[22] = '/\A[eE] /';
$addressreplace[22] = 'East ';

$addresspattern[23] = '/\A[wW] /';
$addressreplace[23] = 'West ';

$addresspattern[24] = '/ [cC][tT] /';
$addressreplace[24] = ' Court ';

$addresspattern[25] = '/ [cC][tT]\. /';
$addressreplace[25] = ' Court ';

$addresspattern[26] = '/ [dD][rR] /';
$addressreplace[26] = ' Drive ';

$addresspattern[27] = '/ [dD][rR]\. /';
$addressreplace[27] = ' Drive ';



$addresspattern[29] = '/ [bB][lL][vV][dD] /';
$addressreplace[29] = ' Blvd. ';

$addresspattern[30] = '/ [dD][rR]\.\Z/';
$addressreplace[30] = ' Drive';

$addresspattern[31] = '/ [cC][tT]\.\Z/';
$addressreplace[31] = ' Court';

$addresspattern[32] = '/ [pP][lL]\Z/';
$addressreplace[32] = ' Place';

$addresspattern[33] = '/ [aA][vV][eE]\.\Z/';
$addressreplace[33] = ' Avenue';

$addresspattern[34] = '/ [aA][vV][eE]\. /';
$addressreplace[34] = ' Avenue ';

$addresspattern[35] = '/ [aA][vV][eE] /';
$addressreplace[35] = ' Avenue ';

$addresspattern[36] = '/ [bB][lL][vV][dD]\Z/';
$addressreplace[36] = ' Blvd.';

$addresspattern[37] = '/ [bB][lL][vV][dD]\.\Z/';
$addressreplace[37] = ' Blvd.';

$addresspattern[38] = '/ [bB][oO][uU][lL][eE][vV][aA][rR][dD] /';
$addressreplace[38] = ' Blvd. ';

$addresspattern[39] = '/ [bB][oO][uU][lL][eE][vV][aA][rR][dD]\Z/';
$addressreplace[39] = ' Blvd.';

$addresspattern[40] = '/ [sS][wW]\Z/';
$addressreplace[40] = ' Southwest';

$addresspattern[41] = '/ [nN][wW]\Z/';
$addressreplace[41] = ' Northwest';

$addresspattern[42] = '/ [nN][wW] /';
$addressreplace[42] = ' Northwest ';

$addresspattern[43] = '/ [nN][eE]\Z/';
$addressreplace[43] = ' Northeast';

$addresspattern[44] = '/ [nN][eE] /';
$addressreplace[44] = ' Northeast ';

$addresspattern[45] = '/ [sS][eE]\Z/';
$addressreplace[45] = ' Southeast';

$addresspattern[46] = '/ [sS][eE] /';
$addressreplace[46] = ' Southeast ';


$external = mysql_query("SELECT id, address FROM step1 where id > 2000 and id  < 3000" ) or die(mysql_error());
$total = 1;
While($row = mysql_fetch_array($external)) {
$address = addslashes($row['address']);
$id = $row['id'];
$newaddress = preg_replace($addresspattern, $addressreplace, $address);
if($newaddress == $address) { $change = 'SAME'; } ELSE { $change = 'CHANGED'; }
//echo '##'.$change.'## '.$address."  ***  ".$newaddress.'</br>';
if($newaddress == $address) {
    //do nothing
    }
    else {
    $address = $newaddress;
   mysql_query("UPDATE step1 set address = '$address' where id = '$id'") or die(mysql_error());
    //echo $address.'</br>';
    $total++;
    }
   
}
echo 'Total Records Changed: '.$total.'</br>';
Print "Execution Time: ".(time()-$bgtime); 

?>

Re: Update Query Performance

Posted: Mon Feb 21, 2011 6:03 am
by Darhazer
Would you please post the step1 table structure (with it's indexes)
If there is no index on the id field, this statement will perform full table scan for each update!
If address is indexed field, it will have to rebuild the index on each update and you can gain performance if you disable indexes in the beginning of the loop and re-enable them in the end.
Also, make sure that the update is the slow part, and not the nearly 50 regular expression you are running on the address

Re: Update Query Performance

Posted: Mon Feb 21, 2011 2:14 pm
by JakeJ
The only index is the id field which is the primary key.

Also, I'm quite sure the regex is not the problem. When I run the script and there are no records to be updated, it finishes in < 1 sec.

I tried disabling keys, that didn't seem to make a difference.

I also created a key on the address column to see if that would make a difference and it made a small difference. 5 minutes per 1000 records compared to just over 6.

I'm at a loss here. I even put this on a rackspace cloud server to make sure it wasn't my computing resources that was the issue. It's exactly the same thing.

Wow, with or without the indexing and changing to InnoDB, it cut the time to less than half. This is progress!

Even so, at 114 seconds per 1000, that still seems like a long time (over 4 hours). But at least it's not all day!

Any other tips are appreciated.

Re: Update Query Performance

Posted: Mon Feb 21, 2011 2:22 pm
by crazycoders
I don't see anything wrong with your code off the start, what i suggest is you try to print out the different between bgtime and time() each time you do an operation that could result in doing a lag.

For example:

echo (time()-$current).'ms'; $current = time();
mysql_query(do something here)
echo (time()-$current).'ms'; $current = time();

You will be able to find your lag time there.

If you are sure that you have no index on the address field and have a primary on the id, then i too don't see what could be your problem... I use databases with 3 million rows of stats data and query and update them relatively easily when i use the correct indexes. It could also be a performance issue, your mysql server is maybe not well tuned?

Sorry to not be of more help...

Re: Update Query Performance

Posted: Tue Feb 22, 2011 4:22 am
by Darhazer
I told you to post the table structure. MyISAM uses table-level lock, while InnoDB uses row-level.
lock/unlock is a slow operation. You can put the entire operation in transaction (InnoDB is required) - it should run faster.
Just execute
START TRANSACTION
before the loop, and
COMMIT
after the loop

Re: Update Query Performance

Posted: Tue Feb 22, 2011 6:34 am
by Mordred
Mysql queries are synchronous, and AFAIK there is no asynchronous interface. As such you're mostly waiting for the query to reach the server, execute and confirm, 1000 times. With row-level locking as Darhazer suggested, you can run this in several threads (generate some IFRAMES each working on a different range of IDs) and further reduce the query transport overhead.

Re: Update Query Performance

Posted: Tue Feb 22, 2011 7:02 am
by Weirdan
Mordred wrote:Mysql queries are synchronous, and AFAIK there is no asynchronous interface.
There's async interface for insert/replace: INSERT/REPLACE DELAYED

Alternatively, in this case it's possible to run something like this:

Code: Select all

start transaction;
select * from someTable where id between 1000 and 2000 for update;
-- process 1000 rows in memory
replace into someTable values
(/*data for first row, including id*/),
(/*data for second row, including id*/),
--- ..........
;
commit;