MASS Downloading...

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

Ummm... I don't think the owner of the free clip art server is going to like you fetching 11,000 images at once and he most likely logs all activity and would probably ban your ip when he sees thousands of requests a minute.... How about you email him and ask for his permission, or a .zip file or something.. just because something says it's free does not mean they want you waste their bandwidth downloading everything the website has to offer.

If you already have permission to do this just ignore everything above...

You could possible use erag to find all links and check if they are on the same domain... useing this method you could get a database of all pages on a given page.. then load the list of pages into an array and go through a foreach loop... inside the loop you would request each page and find all links on that page... make the loop go through itself x amount of times where x= the amount of pages deep to go through the site... This is called recursive functions

Code: Select all

<?php
function (loop) {
  foreach($a as $key=>$value) {
      // Do some code
      if ($needtoloop) {
         loop('somedata');
      } else {
           break;
      }
  }
}
?>

See how the function calls itself while $needtoloop is true? Use the same code structure except requesting pages.. this would "spider" the site... then just find all the image tags and use file_get_contents($imageurl); and then fwrite the contents to a local file
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Just use a spider to copy the entire website and then delete everything that isn't an image. You shouldn't do that though. How are you planning on getting a php script to run for 4 hours anyway lol. It will take a bit to dl all those images.
jclarkkent2003
Forum Contributor
Posts: 123
Joined: Sat Dec 04, 2004 9:14 pm

Post by jclarkkent2003 »

see, i don't know all the image urls, otherwise I would download with flash get,
It's getting all the image url's that the SPIDER I need to write does.

Is there any php scripts that will crawl page, extract all links from that page and crawl all those links and so on? Also a script that restricts itself to the original domain would be nice to be able to turn on and off.

Anyone know of any?

thanks.
jclarkkent2003
Forum Contributor
Posts: 123
Joined: Sat Dec 04, 2004 9:14 pm

Post by jclarkkent2003 »

http://curl.haxx.se/programs/curlmirror.txt

Does this COMPLETELY spider the ENTIRE WEBSITE? EVERYthing in the domain?

I actually need another script, that completely gets THE ENTIRE website, I am going to buy a laptop next week and I will be doing alot of traveling and alot of places where I am bored to death like in class, I would like to write php scripts so I need to DOWNLOAD the entire php.net site to my laptop so when I don't know what a function does, I can look it up on my pc.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Download the chm version all the way to the right. It's a very nice manual with search features.

http://www.php.net/download-docs.php
jclarkkent2003
Forum Contributor
Posts: 123
Joined: Sat Dec 04, 2004 9:14 pm

Post by jclarkkent2003 »

oh tight, many thanks for that.

Is there one of those for mysql, perl, apache, etc?
User avatar
patrikG
DevNet Master
Posts: 4235
Joined: Thu Aug 15, 2002 5:53 am
Location: Sussex, UK

Post by patrikG »

Thread locked. Reason: viewtopic.php?p=176489#176489
Locked