Page 1 of 1

Is bookmark a valid link?

Posted: Fri Aug 06, 2004 11:33 am
by hawleyjr
I have an array of about 50 URLs. All I'm trying to do is see if the link is a valid link. Is there a more effecient way to handle this? It works, I just think it takes too long. (When I say valid I mean does the link work. I'm not worried about does it contain "http" or "https" or "www")

Code: Select all

<?php
$a_bookmarks = array(
'http://www.yahoo.com',
'http://google.com',
'http://www.mysite.com',
.
..
...);

foreach($a_bookmarks  as $val ){
	
	$handle = @fopen($val, "r");
	if(!$handle)
		$a_link_status[] = FALSE;
	else
		$a_link_status[] = TRUE;
	@fclose($handle);	
}
      
?>

Posted: Fri Aug 06, 2004 11:38 am
by feyd
I'd use curl, and just ask it to return the headers. I'd set the timeout pretty low, adding any that fail to a retry array, so you can make sure by retrying those once or twice before saying they don't exist..

Posted: Fri Aug 06, 2004 12:18 pm
by hawleyjr
Thanks for the response. A couple more questions...

Here is what I have now:

Code: Select all

<?php
foreach($a_bookmarks  as $val ){ 
	// create a new curl resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $val);
	curl_setopt($ch, CURLOPT_HEADER, 1);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

	if(!curl_exec($ch))
		$this->a_link_status[] = FALSE;
	else
		$this->a_link_status[] = TRUE;		
	// close curl resource, and free up system resources
	curl_close($ch);
}
?>
My Questons:
1. I'm able to get the headers but the whole page is still being called. How do I call the headers only?
2. There are a couple different timeout options Which one should I use? I'm thinking CURLOPT_TIMEOUT is the correct one?
3. Using the above the curl_exec does not return false if the link is bad.

Posted: Fri Aug 06, 2004 1:00 pm
by feyd
I believe CURLOPT_NOBODY would do it..

As for timeouts.. I'd think CURLOPT_TIMEOUT would be the one to use.. but that may need some experimenting..

As for getting whether the page existed, you'll need to analyze the header returned..

Posted: Sat Aug 07, 2004 12:09 pm
by hawleyjr
Alright, here is what I have and so far I'm having trouble getting it to work.

Code: Select all

<?php
foreach($a_bookmarks  as $val ){ 
   // create a new curl resource 
   $ch = curl_init(); 

   // set URL and other appropriate options 
          curl_setopt($ch, CURLOPT_URL, $val);
          curl_setopt($ch, CURLOPT_HEADER, 1);
          curl_setopt($ch, CURLOPT_TIMEOUT,2);
          curl_setopt($ch, CURLOPT_NOBODY,1);
          
     //What do I do here? How do I verify if the headers are legit?
          curl_exec($ch);

   // close curl resource, and free up system resources 
   curl_close($ch); 
} 
?>

Posted: Sat Aug 07, 2004 12:35 pm
by feyd

Code: Select all

<?php

function getHeader($url)
{
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, 		$url);
	curl_setopt($ch, CURLOPT_HEADER, 	1);
	curl_setopt($ch, CURLOPT_TIMEOUT,	2);
	curl_setopt($ch, CURLOPT_NOBODY,	1);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
	
	return curl_exec($ch);
}

echo getHeader('http://forums.devnetwork.net');

?>
outputs

Code: Select all

HTTP/1.1 200 OK
Date: Sat, 07 Aug 2004 17:32:24 GMT
Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/5.0.2.2634a mod_ssl/2.8.18 OpenSSL/0.9.7a
X-Powered-By: PHP/4.3.8
Set-Cookie: phpbb2mysql_data=s%3A0%3A%22%22%3B; expires=Sun, 07-Aug-05 17:32:24 GMT; path=/; domain=devnetwork.net
Set-Cookie: phpbb2mysql_sid=61b110b6461ddb7e728c7d4f692b1da6; path=/; domain=devnetwork.net
Cache-Control: private, pre-check=0, post-check=0, max-age=0
Expires: Sat, 07 Aug 2004 17:32:24 GMT
Last-Modified: Sat, 07 Aug 2004 17:32:24 GMT
Content-Type: text/html
You want the success codes: HTTP/1.X 2XX, HTTP/1.X 3XX

300's are redirections.. so you may need to look where they want to send you.. :)

Posted: Sat Aug 07, 2004 12:36 pm
by hawleyjr
Feyd you rock. Thanks.