Page 1 of 1

avoid sending a large packet to mysql

Posted: Fri Nov 17, 2006 2:34 pm
by s.dot
I have an array that consists of a few hundred thousand elements... all IDs from a table.

Code: Select all

$valid_ids = array(1,2,3, a few hundred thousand other ids);
Here's what I need to do...

Code: Select all

mysql_query("DELETE FROM `some_other_table` WHERE `msg_id` NOT IN($valid_ids)") or die(mysql_error());
Since there are so many IDs being sent to the query, this exceeds mysql's max packet size, so it can't be done.

Can anyone propose a solution to delete all records from this other table where the msg_id is NOT IN that extremely large array?

I'm stumped.

[edit] This is on a shared server, and I don't have the luxury of changing mysql's configuration. So I need a php solution.

Posted: Fri Nov 17, 2006 2:40 pm
by feyd
I'm going to guess that this array of valid IDs is completely arbitrary?

Pull the entire table for that column. Exclude "manually" with PHP then execute in batches using a normal IN (not negated.)

Posted: Fri Nov 17, 2006 2:45 pm
by s.dot
The array of ids is a list of valid messages in the database from my messages table. These messages still exist in other peoples message box, therefore they should not be deleted.

However, it's built up for a while, and lots of the messages in my actual message table should not be there because the records in my other table pointing to them are gone.

So I looped through the record table and gathered $valid_ids. All these messages still exist in the actual message table. I need to delete the messages from the actual messages table that don't have a messageid that's in $valid_ids.

...... i explained that because I don't know what you mean by "arbitrary"


And as far as your proposed solution, could you elaborate?

Posted: Fri Nov 17, 2006 2:49 pm
by John Cartwright
fyi, you want to implement foreign keys to keep all your tables in sync

Posted: Fri Nov 17, 2006 2:50 pm
by s.dot
I suppose I could do something like this:

Code: Select all

$result = mysql_query("SELECT `id` FROM `messages`");

while($array = mysql_fetch_assoc($result))
{
   if(!in_array($array['id'], $extremely_large_array))
   {
      //delete query
   }
}
Would this work? It would be extremely intensive in the number of queries though. And time consuming searching that large array. 8O

Posted: Fri Nov 17, 2006 2:52 pm
by John Cartwright
break your large list into manageable sizes that myqsl can handle. No need to run through each id specifically, maybe a couple hundreds id's at once.

Posted: Fri Nov 17, 2006 2:54 pm
by feyd
Looking a bit more at the DELETE syntax, a multiple table format is supported. .. and it doesn't necessarily delete records from all the tables used.

Look here for the part containing

Code: Select all

DELETE FROM t1, t2 USING t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;

Posted: Sat Nov 18, 2006 5:33 pm
by s.dot
well i've ran a few hack-ish type scripts and i got the array down to 38, 930 IDs. Do you still think this would be too large to send to a mysql query? I believe the max packet size is 1MB. (isn't that the default?)

Posted: Sat Nov 18, 2006 6:35 pm
by timvw
scottayy wrote: So I looped through the record table and gathered $valid_ids. All these messages still exist in the actual message table. I need to delete the messages from the actual messages table that don't have a messageid that's in $valid_ids.
How exactly do you determine wether an id is valid or not? If you can write that logic in sql, you don't have to transfer the ids back and forth...

DELETE FROM tableB WHERE message_id NOT IN (SELECT valid_ids FROM .... )

Posted: Sun Nov 19, 2006 2:30 pm
by s.dot
I tried that timvw. Here's the query I used

Code: Select all

DELETE FROM `privatemessages` WHERE `id` NOT IN(SELECT UNIQUE `message_id` FROM `privatemessages_data`)
It said error in the syntax near SELECT UNIQUE.. so i'm guessing my mysql version doesn't support subqueries.

Anyhoot, i got it done. Now if I run my script about once a day, it won't have a packet too big for mysql to handle. =]

Posted: Sun Nov 19, 2006 2:32 pm
by s.dot
Here's the script I used.

Code: Select all

<?php
set_time_limit(0);
require 'includes/db_connect.php';

$valid_result = mysql_query("SELECT `message_id` FROM `privatemessages_data`") or die(mysql_error());
$valid_ids = array();
while($valid_array = mysql_fetch_assoc($valid_result))
{
	$valid_ids[] = $valid_array['message_id'];
}

$valid_ids = array_unique($valid_ids);

//gather all the messages from the messages table
$msg_result = mysql_query("SELECT `id` FROM `privatemessages`") or die(mysql_error());
$msgs = array();
while($msg_array = mysql_fetch_assoc($msg_result))
{
	$msgs[] = $msg_array['id'];
}

$to_delete = array_chunk(array_diff($msgs, $valid_ids), 300);

//loop through and delete
foreach($to_delete AS $k => $v)
{
	$v = implode(',', $v);
	mysql_query("DELETE FROM `privatemessages` WHERE `id` IN($v)") or die(mysql_error());
}

echo 'DONE!';

?>

Posted: Sun Nov 19, 2006 2:40 pm
by John Cartwright
scottayy wrote:I tried that timvw. Here's the query I used

Code: Select all

DELETE FROM `privatemessages` WHERE `id` NOT IN(SELECT UNIQUE `message_id` FROM `privatemessages_data`)
It said error in the syntax near SELECT UNIQUE.. so i'm guessing my mysql version doesn't support subqueries.

Anyhoot, i got it done. Now if I run my script about once a day, it won't have a packet too big for mysql to handle. =]

Code: Select all

DELETE FROM `privatemessages` WHERE `id` NOT IN(SELECT DISTINCT `message_id` FROM `privatemessages_data`)
Also, implement foreign keys and you'll avoid the problem completely.