Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.
Moderator: General Moderators
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Thu Apr 10, 2014 5:39 am
I am trying to upload a 350,000 line CSV file.
Each row is checking that it doesn't exist already (to avoid duplicates), and then remove the " at the end of the run.
I thought it would take maybe a minute or two. It's taking a lot longer. Is my script badly done, causing it to be slow?
Code: Select all
if ($update == "addcsv")
{
if ($_FILES[csv][size] > 0)
{
//get the csv file
$file = $_FILES[csv][tmp_name];
$handle = fopen($file,"r");
//loop through the csv file and insert into database
do {
if ($data[0]) {
$newdata = mysql_real_escape_string($data[0]);
$result = mysql_query ("SELECT email FROM subscribed_upload WHERE email = '$newdata'");
$num_result = mysql_num_rows($result);
if ($num_result == 0)
{
mysql_query("INSERT INTO subscribed_upload (email) VALUES
(
'$newdata'
)
")or die(mysql_error());
}
$count = $count + 1;
}
} while ($data = fgetcsv($handle,1000,",","'"));
}
mysql_query ("UPDATE subscribed_upload SET email = REPLACE(email,'\"','')");
}
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Thu Apr 10, 2014 6:55 am
It's definitely wrong, as I stripped it back to this:
Code: Select all
if ($update == "addcsv")
{
if ($_FILES[csv][size] > 0)
{
//get the csv file
$file = $_FILES[csv][tmp_name];
$handle = fopen($file,"r");
//loop through the csv file and insert into database
do {
if ($data[0]) {
$newdata = mysql_real_escape_string($data[0]);
mysql_query("INSERT INTO subscribed_upload (email) VALUES
(
'$newdata'
)
")or die(mysql_error());
$count = $count + 1;
}
} while ($data = fgetcsv($handle,1000,",","'"));
}
mysql_query ("UPDATE subscribed_upload SET email = REPLACE(email,'\"','')");
}
And it ran in under a minute. The previous version took 2 hours and hadn't finished.
I need to do the INSERT query, but to check the email "newdata" being passed through, doesn't already exist.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Thu Apr 10, 2014 7:11 am
I don't think I can use DUPLICATE KEY because the field, "email" is not a primary key.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
Celauran
Moderator
Posts: 6427 Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada
Post
by Celauran » Thu Apr 10, 2014 8:08 am
Is the email field indexed as unique? Probably makes sense as that's a constraint you're clearly looking to enforce. That would also allow you to use INSERT IGNORE or INSERT ... ON DUPLICATE KEY UPDATE.
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Thu Apr 10, 2014 8:40 am
Yeah - it wasn't, but is now. I recreate the table with only "email", and set it as a primary key. The script now works, and quickly.
Cool.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Wed Apr 23, 2014 6:44 am
Hi there
I still have this issue, but we cannot have the email field as primary, as we use the id for something else.
So is there another way to check for duplicates and update or ignore?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
Celauran
Moderator
Posts: 6427 Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada
Post
by Celauran » Wed Apr 23, 2014 6:47 am
It doesn't need to be primary. Unique should suffice.
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Wed Apr 23, 2014 6:50 am
Ok two questions then:
how do I find all duplicoates and delete so only one remains (to tidy it before changing to Unique)
how do I set the email field to be unique?
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
Celauran
Moderator
Posts: 6427 Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada
Post
by Celauran » Wed Apr 23, 2014 7:00 am
Finding the duplicates should be pretty straightforward. Determining which one(s) to delete may be a little more difficult.
Code: Select all
SELECT id, email
FROM subscribed_upload AS subs
INNER JOIN (SELECT email FROM subscribed_upload GROUP BY email HAVING COUNT(id) > 1) AS dupes ON subs.email = dupes.email
CREATE INDEX
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Wed Apr 23, 2014 7:19 am
#1052 - Column 'email' in field list is ambiguous
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435 Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:
Post
by simonmlewis » Wed Apr 23, 2014 8:17 am
Got it.
Used something slightly different, and then set the field as Unique and then my script worked.
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.