Best way to pull 100,000 lines from a .txt into a mysql tb?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Deseree
Forum Commoner
Posts: 84
Joined: Mon Feb 13, 2006 11:35 pm

Best way to pull 100,000 lines from a .txt into a mysql tb?

Post by Deseree »

Hi again, anyone know the best way to pull 100,000 lines of data from a .txt file into a mysql table and no duplicates, and quickly?

Code: Select all

$url_file_to_read = "100000lines.txt";

	$urls_file = file("$url_file_to_read");
	
	for($k=0;$k<sizeof($urls_file);$k++)
	{
		$current_url = trim($urls_file[$k]);

		$sql = "INSERT `$_table` SET `url` = '$current_url';";$sql = mysql_query($sql) or die('WTF:'.mysql_error());
	}
somehow I don't know if this is going to be the best fastest most efficient way of doing this... and this will have duplicates.... :(
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Avoiding duplicates is done by adding a constraint on your schema.. Eg: add a UNIQUE constraint to the URL column. Or make it the primary key (which results in the same UNIQUE constraint)

And for best performance i've found that you best can load the file from within the mysql shell.. (I believe it was called load from file / look it up in the manual).
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

alternatively you may do

Code: Select all

$url_file_to_read = "100000lines.txt";
    $urls_file = file("$url_file_to_read");

    //trim urls
    $urls_file = array_map('trim', $urls_file);    
    // remove duplicates
    $urls_file = array_unique($urls_file);

    for($k=0;$k<sizeof($urls_file);$k++)
    {
        $sql = "INSERT `$_table` SET `url` = '$current_url';";$sql = mysql_query($sql) or die('WTF:'.mysql_error());
    }
;)
Deseree
Forum Commoner
Posts: 84
Joined: Mon Feb 13, 2006 11:35 pm

Post by Deseree »

jcart, that will work , but what about inserting quotes that are already in the db and only once in the .txt ? didn't think of that :(....


PRIMARY
UNIQUE
INDEX > I already have ID set to this.

what are the differences between the three anyways? This is very important...

Also, IF I inserted like 100 lines a query instead of ONE per query, and I set the column quotes to UNIQUE..... wouldn't the call fail since there was a duplicate found and it was a unique row, I think it returns a failed mysql right ??? Or does it silenlty SKIP that row and move to the next, cuz that's what I need...
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

The fastest way is:

Code: Select all

LOAD DATA INFILE '100000lines.txt' REPLACE INTO TABLE mytable;
You just need to create a table that matches your data.
(#10850)
Deseree
Forum Commoner
Posts: 84
Joined: Mon Feb 13, 2006 11:35 pm

Post by Deseree »

I have a something that works until there are no more matches, it "uses" 10 lines per time and i've got 21 lines in there right now, and after all are "used" up, setting the USED var to 1, I want it to run this:

Code: Select all

if($sql == "" || !$sql)
	{
		$sql = "UPDATE `$table` SET `quoted` = '0'";$sql = mysql_query($sql) or die('WTF:'.mysql_error());	// CLEARS quoted column, resets ALL to zero, can quot all again, granted if not quoted in last 3 or whatever days set.
		echo "YOU FINISHED QUOTING THE ENTIRE LIST, RESETING LIST NOW!!!<br>\n";
	}
	else
	{
		echo "SQL = [$sql]<br>\n";
	}
It returns:
Resource id #3

when the list is all finished.

how can i fix this so it runs my if statement and clears the quoted column?

I doubt

Code: Select all

if($sql == "Resource id #3")
will work like I want it to, I think I've tried without success....
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Deseree wrote: PRIMARY
UNIQUE
INDEX > I already have ID set to this.

what are the differences between the three anyways? This is very important...
If it's really important, why don't you look it up then? http://dev.mysql.com/doc/refman/5.1/en/constraints.html seems like a good place to start.
Deseree
Forum Commoner
Posts: 84
Joined: Mon Feb 13, 2006 11:35 pm

Post by Deseree »

timvw wrote:
Deseree wrote: PRIMARY
UNIQUE
INDEX > I already have ID set to this.

what are the differences between the three anyways? This is very important...
If it's really important, why don't you look it up then? http://dev.mysql.com/doc/refman/5.1/en/constraints.html seems like a good place to start.
Thanks, that was what I was looking for and couldn't find easily...
Post Reply