Page 1 of 1
Is bookmark a valid link?
Posted: Fri Aug 06, 2004 11:33 am
by hawleyjr
I have an array of about 50 URLs. All I'm trying to do is see if the link is a valid link. Is there a more effecient way to handle this? It works, I just think it takes too long. (When I say valid I mean does the link work. I'm not worried about does it contain "http" or "https" or "www")
Code: Select all
<?php
$a_bookmarks = array(
'http://www.yahoo.com',
'http://google.com',
'http://www.mysite.com',
.
..
...);
foreach($a_bookmarks as $val ){
$handle = @fopen($val, "r");
if(!$handle)
$a_link_status[] = FALSE;
else
$a_link_status[] = TRUE;
@fclose($handle);
}
?>
Posted: Fri Aug 06, 2004 11:38 am
by feyd
I'd use curl, and just ask it to return the headers. I'd set the timeout pretty low, adding any that fail to a retry array, so you can make sure by retrying those once or twice before saying they don't exist..
Posted: Fri Aug 06, 2004 12:18 pm
by hawleyjr
Thanks for the response. A couple more questions...
Here is what I have now:
Code: Select all
<?php
foreach($a_bookmarks as $val ){
// create a new curl resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $val);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
if(!curl_exec($ch))
$this->a_link_status[] = FALSE;
else
$this->a_link_status[] = TRUE;
// close curl resource, and free up system resources
curl_close($ch);
}
?>
My Questons:
1. I'm able to get the headers but the whole page is still being called. How do I call the headers only?
2. There are a couple different timeout options Which one should I use? I'm thinking CURLOPT_TIMEOUT is the correct one?
3. Using the above the curl_exec does not return false if the link is bad.
Posted: Fri Aug 06, 2004 1:00 pm
by feyd
I believe CURLOPT_NOBODY would do it..
As for timeouts.. I'd think CURLOPT_TIMEOUT would be the one to use.. but that may need some experimenting..
As for getting whether the page existed, you'll need to analyze the header returned..
Posted: Sat Aug 07, 2004 12:09 pm
by hawleyjr
Alright, here is what I have and so far I'm having trouble getting it to work.
Code: Select all
<?php
foreach($a_bookmarks as $val ){
// create a new curl resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $val);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT,2);
curl_setopt($ch, CURLOPT_NOBODY,1);
//What do I do here? How do I verify if the headers are legit?
curl_exec($ch);
// close curl resource, and free up system resources
curl_close($ch);
}
?>
Posted: Sat Aug 07, 2004 12:35 pm
by feyd
Code: Select all
<?php
function getHeader($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 2);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
return curl_exec($ch);
}
echo getHeader('http://forums.devnetwork.net');
?>
outputs
Code: Select all
HTTP/1.1 200 OK
Date: Sat, 07 Aug 2004 17:32:24 GMT
Server: Apache/1.3.31 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.3.8 FrontPage/5.0.2.2634a mod_ssl/2.8.18 OpenSSL/0.9.7a
X-Powered-By: PHP/4.3.8
Set-Cookie: phpbb2mysql_data=s%3A0%3A%22%22%3B; expires=Sun, 07-Aug-05 17:32:24 GMT; path=/; domain=devnetwork.net
Set-Cookie: phpbb2mysql_sid=61b110b6461ddb7e728c7d4f692b1da6; path=/; domain=devnetwork.net
Cache-Control: private, pre-check=0, post-check=0, max-age=0
Expires: Sat, 07 Aug 2004 17:32:24 GMT
Last-Modified: Sat, 07 Aug 2004 17:32:24 GMT
Content-Type: text/html
You want the success codes: HTTP/1.X 2XX, HTTP/1.X 3XX
300's are redirections.. so you may need to look where they want to send you..

Posted: Sat Aug 07, 2004 12:36 pm
by hawleyjr
Feyd you rock. Thanks.