INSERT Query via CSV taking ages

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

INSERT Query via CSV taking ages

Post by simonmlewis »

I am trying to upload a 350,000 line CSV file.
Each row is checking that it doesn't exist already (to avoid duplicates), and then remove the " at the end of the run.
I thought it would take maybe a minute or two. It's taking a lot longer. Is my script badly done, causing it to be slow?

Code: Select all

if ($update == "addcsv")
{
if ($_FILES[csv][size] > 0) 
{
    //get the csv file
    $file = $_FILES[csv][tmp_name];
    $handle = fopen($file,"r");
    //loop through the csv file and insert into database
    do {
        if ($data[0]) {
        $newdata = mysql_real_escape_string($data[0]);
        
        $result = mysql_query ("SELECT email FROM subscribed_upload WHERE email = '$newdata'");
        $num_result = mysql_num_rows($result);
        if ($num_result == 0)
        {
            mysql_query("INSERT INTO subscribed_upload (email) VALUES
                (
                     '$newdata'
                )
            ")or die(mysql_error());
        }
            $count = $count + 1;
        }
    } while ($data = fgetcsv($handle,1000,",","'"));
}
mysql_query ("UPDATE subscribed_upload SET email = REPLACE(email,'\"','')");

}
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

It's definitely wrong, as I stripped it back to this:

Code: Select all

if ($update == "addcsv")
{
if ($_FILES[csv][size] > 0) 
{
    //get the csv file
    $file = $_FILES[csv][tmp_name];
    $handle = fopen($file,"r");
    //loop through the csv file and insert into database
    do {
        if ($data[0]) {
        $newdata = mysql_real_escape_string($data[0]);
        
 
            mysql_query("INSERT INTO subscribed_upload (email) VALUES
                (
                     '$newdata'
                )
            ")or die(mysql_error());

            $count = $count + 1;
        }
    } while ($data = fgetcsv($handle,1000,",","'"));
}
mysql_query ("UPDATE subscribed_upload SET email = REPLACE(email,'\"','')");

}
And it ran in under a minute. The previous version took 2 hours and hadn't finished.

I need to do the INSERT query, but to check the email "newdata" being passed through, doesn't already exist.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

I don't think I can use DUPLICATE KEY because the field, "email" is not a primary key.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: INSERT Query via CSV taking ages

Post by Celauran »

Is the email field indexed as unique? Probably makes sense as that's a constraint you're clearly looking to enforce. That would also allow you to use INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

Yeah - it wasn't, but is now. I recreate the table with only "email", and set it as a primary key. The script now works, and quickly.
Cool.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

Hi there

I still have this issue, but we cannot have the email field as primary, as we use the id for something else.

So is there another way to check for duplicates and update or ignore?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: INSERT Query via CSV taking ages

Post by Celauran »

It doesn't need to be primary. Unique should suffice.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

Ok two questions then:
how do I find all duplicoates and delete so only one remains (to tidy it before changing to Unique)
how do I set the email field to be unique?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: INSERT Query via CSV taking ages

Post by Celauran »

Finding the duplicates should be pretty straightforward. Determining which one(s) to delete may be a little more difficult.

Code: Select all

SELECT id, email
FROM subscribed_upload AS subs
INNER JOIN (SELECT email FROM subscribed_upload GROUP BY email HAVING COUNT(id) > 1) AS dupes ON subs.email = dupes.email
CREATE INDEX
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

#1052 - Column 'email' in field list is ambiguous
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: INSERT Query via CSV taking ages

Post by simonmlewis »

Got it.
Used something slightly different, and then set the field as Unique and then my script worked.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
Post Reply