Is bookmark a valid link?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Is bookmark a valid link?

Post by hawleyjr »

I have an array of about 50 URLs. All I'm trying to do is see if the link is a valid link. Is there a more effecient way to handle this? It works, I just think it takes too long. (When I say valid I mean does the link work. I'm not worried about does it contain "http" or "https" or "www")

Code: Select all

<?php
$a_bookmarks = array(
'http://www.yahoo.com',
'http://google.com',
'http://www.mysite.com',
.
..
...);

foreach($a_bookmarks  as $val ){
	
	$handle = @fopen($val, "r");
	if(!$handle)
		$a_link_status[] = FALSE;
	else
		$a_link_status[] = TRUE;
	@fclose($handle);	
}
      
?>
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I'd use curl, and just ask it to return the headers. I'd set the timeout pretty low, adding any that fail to a retry array, so you can make sure by retrying those once or twice before saying they don't exist..
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Post by hawleyjr »

Thanks for the response. A couple more questions...

Here is what I have now:

Code: Select all

<?php
foreach($a_bookmarks  as $val ){ 
	// create a new curl resource
	$ch = curl_init();

	// set URL and other appropriate options
	curl_setopt($ch, CURLOPT_URL, $val);
	curl_setopt($ch, CURLOPT_HEADER, 1);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);

	if(!curl_exec($ch))
		$this->a_link_status[] = FALSE;
	else
		$this->a_link_status[] = TRUE;		
	// close curl resource, and free up system resources
	curl_close($ch);
}
?>
My Questons:
1. I'm able to get the headers but the whole page is still being called. How do I call the headers only?
2. There are a couple different timeout options Which one should I use? I'm thinking CURLOPT_TIMEOUT is the correct one?
3. Using the above the curl_exec does not return false if the link is bad.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I believe CURLOPT_NOBODY would do it..

As for timeouts.. I'd think CURLOPT_TIMEOUT would be the one to use.. but that may need some experimenting..

As for getting whether the page existed, you'll need to analyze the header returned..
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Post by hawleyjr »

Alright, here is what I have and so far I'm having trouble getting it to work.

Code: Select all

<?php
foreach($a_bookmarks  as $val ){ 
   // create a new curl resource 
   $ch = curl_init(); 

   // set URL and other appropriate options 
          curl_setopt($ch, CURLOPT_URL, $val);
          curl_setopt($ch, CURLOPT_HEADER, 1);
          curl_setopt($ch, CURLOPT_TIMEOUT,2);
          curl_setopt($ch, CURLOPT_NOBODY,1);
          
     //What do I do here? How do I verify if the headers are legit?
          curl_exec($ch);

   // close curl resource, and free up system resources 
   curl_close($ch); 
} 
?>
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code: Select all

<?php

function getHeader($url)
{
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, 		$url);
	curl_setopt($ch, CURLOPT_HEADER, 	1);
	curl_setopt($ch, CURLOPT_TIMEOUT,	2);
	curl_setopt($ch, CURLOPT_NOBODY,	1);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
	
	return curl_exec($ch);
}

echo getHeader('http://forums.devnetwork.net');

?>
outputs

Code: Select all

HTTP/1.1 200 OK
Date: Sat, 07 Aug 2004 17:32:24 GMT
Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/5.0.2.2634a mod_ssl/2.8.18 OpenSSL/0.9.7a
X-Powered-By: PHP/4.3.8
Set-Cookie: phpbb2mysql_data=s%3A0%3A%22%22%3B; expires=Sun, 07-Aug-05 17:32:24 GMT; path=/; domain=devnetwork.net
Set-Cookie: phpbb2mysql_sid=61b110b6461ddb7e728c7d4f692b1da6; path=/; domain=devnetwork.net
Cache-Control: private, pre-check=0, post-check=0, max-age=0
Expires: Sat, 07 Aug 2004 17:32:24 GMT
Last-Modified: Sat, 07 Aug 2004 17:32:24 GMT
Content-Type: text/html
You want the success codes: HTTP/1.X 2XX, HTTP/1.X 3XX

300's are redirections.. so you may need to look where they want to send you.. :)
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Post by hawleyjr »

Feyd you rock. Thanks.
Post Reply