avoid sending a large packet to mysql

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

avoid sending a large packet to mysql

Post by s.dot »

I have an array that consists of a few hundred thousand elements... all IDs from a table.

Code: Select all

$valid_ids = array(1,2,3, a few hundred thousand other ids);
Here's what I need to do...

Code: Select all

mysql_query("DELETE FROM `some_other_table` WHERE `msg_id` NOT IN($valid_ids)") or die(mysql_error());
Since there are so many IDs being sent to the query, this exceeds mysql's max packet size, so it can't be done.

Can anyone propose a solution to delete all records from this other table where the msg_id is NOT IN that extremely large array?

I'm stumped.

[edit] This is on a shared server, and I don't have the luxury of changing mysql's configuration. So I need a php solution.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I'm going to guess that this array of valid IDs is completely arbitrary?

Pull the entire table for that column. Exclude "manually" with PHP then execute in batches using a normal IN (not negated.)
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

The array of ids is a list of valid messages in the database from my messages table. These messages still exist in other peoples message box, therefore they should not be deleted.

However, it's built up for a while, and lots of the messages in my actual message table should not be there because the records in my other table pointing to them are gone.

So I looped through the record table and gathered $valid_ids. All these messages still exist in the actual message table. I need to delete the messages from the actual messages table that don't have a messageid that's in $valid_ids.

...... i explained that because I don't know what you mean by "arbitrary"


And as far as your proposed solution, could you elaborate?
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

fyi, you want to implement foreign keys to keep all your tables in sync
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

I suppose I could do something like this:

Code: Select all

$result = mysql_query("SELECT `id` FROM `messages`");

while($array = mysql_fetch_assoc($result))
{
   if(!in_array($array['id'], $extremely_large_array))
   {
      //delete query
   }
}
Would this work? It would be extremely intensive in the number of queries though. And time consuming searching that large array. 8O
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

break your large list into manageable sizes that myqsl can handle. No need to run through each id specifically, maybe a couple hundreds id's at once.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Looking a bit more at the DELETE syntax, a multiple table format is supported. .. and it doesn't necessarily delete records from all the tables used.

Look here for the part containing

Code: Select all

DELETE FROM t1, t2 USING t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

well i've ran a few hack-ish type scripts and i got the array down to 38, 930 IDs. Do you still think this would be too large to send to a mysql query? I believe the max packet size is 1MB. (isn't that the default?)
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

scottayy wrote: So I looped through the record table and gathered $valid_ids. All these messages still exist in the actual message table. I need to delete the messages from the actual messages table that don't have a messageid that's in $valid_ids.
How exactly do you determine wether an id is valid or not? If you can write that logic in sql, you don't have to transfer the ids back and forth...

DELETE FROM tableB WHERE message_id NOT IN (SELECT valid_ids FROM .... )
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

I tried that timvw. Here's the query I used

Code: Select all

DELETE FROM `privatemessages` WHERE `id` NOT IN(SELECT UNIQUE `message_id` FROM `privatemessages_data`)
It said error in the syntax near SELECT UNIQUE.. so i'm guessing my mysql version doesn't support subqueries.

Anyhoot, i got it done. Now if I run my script about once a day, it won't have a packet too big for mysql to handle. =]
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Here's the script I used.

Code: Select all

<?php
set_time_limit(0);
require 'includes/db_connect.php';

$valid_result = mysql_query("SELECT `message_id` FROM `privatemessages_data`") or die(mysql_error());
$valid_ids = array();
while($valid_array = mysql_fetch_assoc($valid_result))
{
	$valid_ids[] = $valid_array['message_id'];
}

$valid_ids = array_unique($valid_ids);

//gather all the messages from the messages table
$msg_result = mysql_query("SELECT `id` FROM `privatemessages`") or die(mysql_error());
$msgs = array();
while($msg_array = mysql_fetch_assoc($msg_result))
{
	$msgs[] = $msg_array['id'];
}

$to_delete = array_chunk(array_diff($msgs, $valid_ids), 300);

//loop through and delete
foreach($to_delete AS $k => $v)
{
	$v = implode(',', $v);
	mysql_query("DELETE FROM `privatemessages` WHERE `id` IN($v)") or die(mysql_error());
}

echo 'DONE!';

?>
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

scottayy wrote:I tried that timvw. Here's the query I used

Code: Select all

DELETE FROM `privatemessages` WHERE `id` NOT IN(SELECT UNIQUE `message_id` FROM `privatemessages_data`)
It said error in the syntax near SELECT UNIQUE.. so i'm guessing my mysql version doesn't support subqueries.

Anyhoot, i got it done. Now if I run my script about once a day, it won't have a packet too big for mysql to handle. =]

Code: Select all

DELETE FROM `privatemessages` WHERE `id` NOT IN(SELECT DISTINCT `message_id` FROM `privatemessages_data`)
Also, implement foreign keys and you'll avoid the problem completely.
Post Reply