Link tester
Moderator: General Moderators
-
Shendemiar
- Forum Contributor
- Posts: 404
- Joined: Thu Jan 08, 2004 8:28 am
Link tester
Does anyone know ready class or usefull sniplets for an app, that runs to check all the links under certain domain, and reports broken link to a file or by email?
Do you have your links in an array already? If not it can be done with 2 lines of code, hint: preg_match
Then all you need is a foreach loop and sockets, send a HEAD request to the URL and preg_match the response headers for 200 OK, of course this will throw other response types into the "broken link" category, if you want to only test for 404s just preg_match a "404" in the response header. You have to make sure you're looking at the right header though because you don't want to match 404 or 200 out of the content-length or some other numerically valued header.
If you're completely lost I might write something up for you later, because I need it myself as well.
Then all you need is a foreach loop and sockets, send a HEAD request to the URL and preg_match the response headers for 200 OK, of course this will throw other response types into the "broken link" category, if you want to only test for 404s just preg_match a "404" in the response header. You have to make sure you're looking at the right header though because you don't want to match 404 or 200 out of the content-length or some other numerically valued header.
If you're completely lost I might write something up for you later, because I need it myself as well.
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
The snippet I posted for Heavy a long time ago would work as a base for doing this.
viewtopic.php?t=29312&highlight=curl+src+background
viewtopic.php?t=29312&highlight=curl+src+background
-
Shendemiar
- Forum Contributor
- Posts: 404
- Joined: Thu Jan 08, 2004 8:28 am
Does it have to be a php script?
I use Xenu: http://home.snafu.de/tilman/xenulink.html, a great little program. In a few seconds it checks hundreds of links, and reports all broken links, ordered by page and by link. Gives a complete sitemap with all valid urls, etc etc.
(sorry if it's not what you need)
I use Xenu: http://home.snafu.de/tilman/xenulink.html, a great little program. In a few seconds it checks hundreds of links, and reports all broken links, ordered by page and by link. Gives a complete sitemap with all valid urls, etc etc.
(sorry if it's not what you need)
-
Shendemiar
- Forum Contributor
- Posts: 404
- Joined: Thu Jan 08, 2004 8:28 am
Does not have SSL support, but it also does not require cURL. It fully supports header redirects, but there's no check for infinite recursion so you might want to add that.
Code: Select all
<?php
$link_checker = new link_checker();
// This is how you add a URL to be checked
$link_checker -> add_url('http://google.com');
$link_checker -> add_url('http://www.yahoo.com/');
$link_checker -> add_url('http://www.yahoo.com/admin');
$link_checker -> add_url('http://foobar.com/admin');
// This checks all your links and returns an array of results
$result = $link_checker -> check() or die($link_checker -> get_error());
// I'm just outputting the array here, do whatever you want for this
?>
<table>
<tr>
<th>Page</th>
<th>Exists</th>
</tr>
<?php
foreach($result as $current) {
list($page,$exists)=$current;
?>
<tr>
<td><?php echo htmlentities($page,ENT_QUOTES,'UTF-8'); ?></td>
<td><?php echo ( ($exists) ? 'yes' : 'no' ); ?></td>
</tr>
<?php
}
?>
</table>
<?php
class link_checker {
/******************************************
Really simple link checking class,
uses sockets and supports following
header redirects. This script has no
license and comes with no warranty.
By jshpro2 @ http://www.devnetwork.net
Email jshpro2 [{at}] gmail.com
******************************************/
var $urls;
var $error = NULL;
function link_checker() {
$this -> urls = array();
}
function add_url($url) {
/* Add a URL to the array */
$this -> urls[] = array(
$url, false
);
}
function check() {
/* Check all the URLs */
if (!$c=count($this->urls)) {
$this -> set_error('No URLs to check');
return(false);
} else {
for ($i=0;$i<$c;$i++) {
$this->urls[$i][1] = $this -> check_page($this->urls[$i][0]);
}
return($this->urls);
}
}
function check_page($url) {
$exists=false;
/* Get the host from this URL */
preg_match('@(http://)?([^/]+)(.+)?@i',trim($url),$match);
$host = $match[2];
$url = $match[3];
if (!$host) {
$this -> set_error('No host');
return(false);
}
/* Get the port for the WWW service. */
$service_port = getservbyname('www', 'tcp');
/* Get the IP address for the target host. */
$address = gethostbyname($host);
if ($address == $host) {
$this->set_error('Could not find IP for hostname');
return(false);
}
/* Create a TCP/IP socket. */
if (!$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP)) {
$this -> set_error("socket_create() failed: reason: " . socket_strerror($socket) . "\n");
return(false);
}
/* Connect to the host */
if (!$result = socket_connect($socket, $address, $service_port)) {
$this -> set_error("socket_connect() failed.\nReason: ($result) " . socket_strerror($result) . "\n");
return(false);
}
$in = "HEAD ".(($url) ? $url : "/")." HTTP/1.1\r\n";
$in .= "Host: ".$host."\r\n";
$in .= "Connection: Close\r\n\r\n";
$out = '';
$read = '';
socket_write($socket, $in, strlen($in));
while($read = socket_read($socket, 2048)) {
$out .= $read;
}
$headers = preg_split('@\r\n|\r|\n@',$out);
foreach($headers as $header) {
/* Break down the header */
$header = explode(':',$header,2);
switch(strtolower(trim($header[0]))) {
case 'location':
return(trim($this->check_page($header[1])));
break;
}
$exists = preg_match('@200@',$header[0]);
if ($exists) break;
}
return($exists);
}
function set_error($error) {
echo $error;
$this -> error = $error;
}
function get_error() {
return($this -> error);
}
}
?>-
Shendemiar
- Forum Contributor
- Posts: 404
- Joined: Thu Jan 08, 2004 8:28 am