Page 1 of 1
Best way to pull 100,000 lines from a .txt into a mysql tb?
Posted: Sat Apr 01, 2006 3:54 pm
by Deseree
Hi again, anyone know the best way to pull 100,000 lines of data from a .txt file into a mysql table and no duplicates, and quickly?
Code: Select all
$url_file_to_read = "100000lines.txt";
$urls_file = file("$url_file_to_read");
for($k=0;$k<sizeof($urls_file);$k++)
{
$current_url = trim($urls_file[$k]);
$sql = "INSERT `$_table` SET `url` = '$current_url';";$sql = mysql_query($sql) or die('WTF:'.mysql_error());
}
somehow I don't know if this is going to be the best fastest most efficient way of doing this... and this will have duplicates....

Posted: Sat Apr 01, 2006 4:39 pm
by timvw
Avoiding duplicates is done by adding a constraint on your schema.. Eg: add a UNIQUE constraint to the URL column. Or make it the primary key (which results in the same UNIQUE constraint)
And for best performance i've found that you best can load the file from within the mysql shell.. (I believe it was called load from file / look it up in the manual).
Posted: Sat Apr 01, 2006 4:43 pm
by John Cartwright
alternatively you may do
Code: Select all
$url_file_to_read = "100000lines.txt";
$urls_file = file("$url_file_to_read");
//trim urls
$urls_file = array_map('trim', $urls_file);
// remove duplicates
$urls_file = array_unique($urls_file);
for($k=0;$k<sizeof($urls_file);$k++)
{
$sql = "INSERT `$_table` SET `url` = '$current_url';";$sql = mysql_query($sql) or die('WTF:'.mysql_error());
}

Posted: Sat Apr 01, 2006 5:05 pm
by Deseree
jcart, that will work , but what about inserting quotes that are already in the db and only once in the .txt ? didn't think of that

....
PRIMARY
UNIQUE
INDEX > I already have ID set to this.
what are the differences between the three anyways? This is very important...
Also, IF I inserted like 100 lines a query instead of ONE per query, and I set the column quotes to UNIQUE..... wouldn't the call fail since there was a duplicate found and it was a unique row, I think it returns a failed mysql right ??? Or does it silenlty SKIP that row and move to the next, cuz that's what I need...
Posted: Sat Apr 01, 2006 5:44 pm
by Christopher
The fastest way is:
Code: Select all
LOAD DATA INFILE '100000lines.txt' REPLACE INTO TABLE mytable;
You just need to create a table that matches your data.
Posted: Sat Apr 01, 2006 7:05 pm
by Deseree
I have a something that works until there are no more matches, it "uses" 10 lines per time and i've got 21 lines in there right now, and after all are "used" up, setting the USED var to 1, I want it to run this:
Code: Select all
if($sql == "" || !$sql)
{
$sql = "UPDATE `$table` SET `quoted` = '0'";$sql = mysql_query($sql) or die('WTF:'.mysql_error()); // CLEARS quoted column, resets ALL to zero, can quot all again, granted if not quoted in last 3 or whatever days set.
echo "YOU FINISHED QUOTING THE ENTIRE LIST, RESETING LIST NOW!!!<br>\n";
}
else
{
echo "SQL = [$sql]<br>\n";
}
It returns:
Resource id #3
when the list is all finished.
how can i fix this so it runs my if statement and clears the quoted column?
I doubt
will work like I want it to, I think I've tried without success....
Posted: Sun Apr 02, 2006 5:50 am
by timvw
Deseree wrote:
PRIMARY
UNIQUE
INDEX > I already have ID set to this.
what are the differences between the three anyways? This is very important...
If it's really important, why don't you look it up then?
http://dev.mysql.com/doc/refman/5.1/en/constraints.html seems like a good place to start.
Posted: Sun Apr 02, 2006 10:03 am
by Deseree
timvw wrote:Deseree wrote:
PRIMARY
UNIQUE
INDEX > I already have ID set to this.
what are the differences between the three anyways? This is very important...
If it's really important, why don't you look it up then?
http://dev.mysql.com/doc/refman/5.1/en/constraints.html seems like a good place to start.
Thanks, that was what I was looking for and couldn't find easily...