Page 2 of 2

Posted: Sun Dec 05, 2004 7:47 pm
by josh
Ummm... I don't think the owner of the free clip art server is going to like you fetching 11,000 images at once and he most likely logs all activity and would probably ban your ip when he sees thousands of requests a minute.... How about you email him and ask for his permission, or a .zip file or something.. just because something says it's free does not mean they want you waste their bandwidth downloading everything the website has to offer.

If you already have permission to do this just ignore everything above...

You could possible use erag to find all links and check if they are on the same domain... useing this method you could get a database of all pages on a given page.. then load the list of pages into an array and go through a foreach loop... inside the loop you would request each page and find all links on that page... make the loop go through itself x amount of times where x= the amount of pages deep to go through the site... This is called recursive functions

Code: Select all

<?php
function (loop) {
  foreach($a as $key=>$value) {
      // Do some code
      if ($needtoloop) {
         loop('somedata');
      } else {
           break;
      }
  }
}
?>

See how the function calls itself while $needtoloop is true? Use the same code structure except requesting pages.. this would "spider" the site... then just find all the image tags and use file_get_contents($imageurl); and then fwrite the contents to a local file

Posted: Sun Dec 05, 2004 8:14 pm
by Benjamin
Just use a spider to copy the entire website and then delete everything that isn't an image. You shouldn't do that though. How are you planning on getting a php script to run for 4 hours anyway lol. It will take a bit to dl all those images.

Posted: Sun Dec 05, 2004 8:45 pm
by jclarkkent2003
see, i don't know all the image urls, otherwise I would download with flash get,
It's getting all the image url's that the SPIDER I need to write does.

Is there any php scripts that will crawl page, extract all links from that page and crawl all those links and so on? Also a script that restricts itself to the original domain would be nice to be able to turn on and off.

Anyone know of any?

thanks.

Posted: Sun Dec 05, 2004 9:03 pm
by jclarkkent2003
http://curl.haxx.se/programs/curlmirror.txt

Does this COMPLETELY spider the ENTIRE WEBSITE? EVERYthing in the domain?

I actually need another script, that completely gets THE ENTIRE website, I am going to buy a laptop next week and I will be doing alot of traveling and alot of places where I am bored to death like in class, I would like to write php scripts so I need to DOWNLOAD the entire php.net site to my laptop so when I don't know what a function does, I can look it up on my pc.

Posted: Sun Dec 05, 2004 9:05 pm
by Benjamin
Download the chm version all the way to the right. It's a very nice manual with search features.

http://www.php.net/download-docs.php

Posted: Sun Dec 05, 2004 10:05 pm
by jclarkkent2003
oh tight, many thanks for that.

Is there one of those for mysql, perl, apache, etc?

Posted: Sat May 28, 2005 3:00 pm
by patrikG
Thread locked. Reason: viewtopic.php?p=176489#176489