Page 1 of 1

file_get_contents to CURL

Posted: Fri Dec 10, 2004 1:28 pm
by EricS
I've got an application that retrieves hunderds of web pages when it runs. I was using file_get_contents() to retrieve each page and the script would take several minutes to run. But even though it took several minutes the browser would never time out, it would just wait for the application to finish running.

I decided to take a stab at using CURL in the hopes it would make things more efficient. Namely, I wanted the application to stop trying to retrieve a site after a specified period of time and go on to the next site. Since I couldn't find a way to do that with file_get_contents, I start to write CURL in it's place.

What's happening now is that browser is timing out when I use the curl code and I can't find a way to keep it from happening. Here is the code I'm currently employing.

Code: Select all

<?php
$curlResource = curl_init();
// set CURL options
curl_setopt($curlResource, CURLOPT_URL, 'http://'.$protocolessURL); // set destination.
curl_setopt($curlResource, CURLOPT_FOLLOWLOCATION, true); // allow CURL to follow redirect headers.
curl_setopt($curlResource, CURLOPT_MAXREDIRS, 2); // allow only on redirect before failure.
// curl_setopt($curlResource, CURLOPT_MUTE, true); // run without errors being displayed.
curl_setopt($curlResource, CURLOPT_RETURNTRANSFER, true); // return output as string rather than to screen.
curl_setopt($curlResource, CURLOPT_CONNECTTIMEOUT, $connectionTimeOut); // set time limit on connection
curl_setopt($curlResource, CURLOPT_LOW_SPEED_TIME, $attemptTimeOut); // set time to fail per fetch attempt
curl_setopt($curlResource, CURLOPT_TIMEOUT, $maxTimeOut); // maximum time a curl function can run
curl_setopt($curlResource, CURLOPT_USERAGENT, $userAgent); // the user agent to be sent in http requests
curl_setopt($curlResource, CURLOPT_NOSIGNAL, true);
// execute CURL operation
$contents = curl_exec($curlResource);
print curl_error($curlResource);
// close CURL session
curl_close($curlResource);
?>

Posted: Fri Dec 10, 2004 3:15 pm
by Joe

Posted: Fri Dec 10, 2004 4:14 pm
by EricS
So does a call to sleep() signal the browser to restart it's connection time? I don't see anything in the PHP Manual that says this.

Posted: Fri Dec 10, 2004 5:02 pm
by rehfeld
take a look at the different headers that your web page is sending when using curl, there must be something different.

this forum sends this header for example:

Keep-Alive: timeout=25, max=100

Posted: Sat Dec 11, 2004 2:58 pm
by ol4pr0
And no sleep does not signal the browser to restart

sleep() means sleep :)