Page 1 of 1
Download images from url list
Posted: Sat Jul 15, 2006 4:18 am
by idotcom
Hi
I am trying to find the best way to go about downloading about 25,000 images from a text file url list.
Example: About 25K of these:
http://www.somewebsite.com/images/download/98348024.jpg
I'm not stealing these images, I just need to download all of them as an affiliate.
All the images are from the same website, and the average image size is probably 30-45kb.
Anyhow, does anyone have any ideas on how to do this automatically and without swamping the site?
I imagine it's going to be about 1gb of data.
Thank you.

Posted: Sat Jul 15, 2006 4:45 am
by Ollie Saunders
text file url list
if the urls are each on their separate line:
Code: Select all
define('SAVE_DIR','C:/images/');
$urls = file('list_of_urls.txt');
for ($i=0, $j=count($urls); $i<$j; $i++) {
$img = file_get_contents($urls[$i]);
if(!trim($urls[$i])) {
echo $i . '] line empty, skipped';
continue;
}
$saveLocation = SAVE_DIR . basename($urls[$i]);
if(!$h = fopen($saveLocation, 'w')) {
echo $i. '] Couldn\'t open ' . $saveLocation . ' for writing<br />';
continue;
}
fwrite($h, $img);
fclose($h);
echo $i . '] success! <br />';
}
Posted: Sat Jul 15, 2006 5:48 am
by idotcom
Hi
thanks for the info. But how do you think this would do? I mean without some kind of pauses. This looks like it will just run continuously untill the list is done. Wouldn't the script time out? And I think that would put a strain on my server and the other site, no?
Thanks

Posted: Sat Jul 15, 2006 6:23 am
by aerodromoi
idotcom wrote:Hi
thanks for the info. But how do you think this would do? I mean without some kind of pauses. This looks like it will just run continuously untill the list is done. Wouldn't the script time out? And I think that would put a strain on my server and the other site, no?
Thanks

1GB of data always puts
some strain on the server
A few options to ease this problem:
a) restricting the number of downloads per call (downloading 300 images at a time - while deleting these images from the file "to-do" list)
b) using cronjobs
c) using a combination of both (from 3 to 5am the load on the servers should be minimal)
d) asking your affiliate to send you a dvd with these images.
Posted: Sat Jul 15, 2006 7:19 am
by Ollie Saunders
yeah put a flush() at the top of the for loop.
then run it for as long as it will run. because it prints out $i you can see where it stops.
then update the script to start from where it left off:
repeat
And I think that would put a strain on my server and the other site, no?
only as fast as the internet connection speed and how else you gonna do it?