Page 1 of 1

Link tester

Posted: Sun Feb 12, 2006 12:18 pm
by Shendemiar
Does anyone know ready class or usefull sniplets for an app, that runs to check all the links under certain domain, and reports broken link to a file or by email?

Posted: Sun Feb 12, 2006 12:42 pm
by feyd
Code Snippets is not a place to request snippets. :?


Moved to PHP - Code.

Posted: Sun Feb 12, 2006 1:05 pm
by josh
Do you have your links in an array already? If not it can be done with 2 lines of code, hint: preg_match


Then all you need is a foreach loop and sockets, send a HEAD request to the URL and preg_match the response headers for 200 OK, of course this will throw other response types into the "broken link" category, if you want to only test for 404s just preg_match a "404" in the response header. You have to make sure you're looking at the right header though because you don't want to match 404 or 200 out of the content-length or some other numerically valued header.


If you're completely lost I might write something up for you later, because I need it myself as well.

Posted: Sun Feb 12, 2006 1:09 pm
by feyd
The snippet I posted for Heavy a long time ago would work as a base for doing this.

viewtopic.php?t=29312&highlight=curl+src+background

Posted: Sun Feb 12, 2006 1:15 pm
by Shendemiar
jshpro2 wrote:
If you're completely lost I might write something up for you later, because I need it myself as well.
Notify me if you do, i bet yours will be better...

Posted: Sun Feb 12, 2006 1:31 pm
by matthijs
Does it have to be a php script?

I use Xenu: http://home.snafu.de/tilman/xenulink.html, a great little program. In a few seconds it checks hundreds of links, and reports all broken links, ordered by page and by link. Gives a complete sitemap with all valid urls, etc etc.

(sorry if it's not what you need)

Posted: Sun Feb 12, 2006 1:37 pm
by Shendemiar
Preferably php script so i can cron it weekly. I also need similar code to manipulate & provide extra info of links, so the code would be double usefull.

Posted: Sun Feb 12, 2006 4:04 pm
by josh
Does not have SSL support, but it also does not require cURL. It fully supports header redirects, but there's no check for infinite recursion so you might want to add that.

Code: Select all

<?php

$link_checker = new link_checker();

// This is how you add a  URL to be checked
$link_checker -> add_url('http://google.com');
$link_checker -> add_url('http://www.yahoo.com/');
$link_checker -> add_url('http://www.yahoo.com/admin');
$link_checker -> add_url('http://foobar.com/admin');

// This checks all your links and returns an array of results
$result = $link_checker -> check() or die($link_checker -> get_error());


// I'm just outputting the array here, do whatever you want for this
?>
<table>
	<tr>
		<th>Page</th>
		<th>Exists</th>
	</tr>
<?php
foreach($result as $current) {
	list($page,$exists)=$current;
	?>
	<tr>
		<td><?php echo htmlentities($page,ENT_QUOTES,'UTF-8'); ?></td>
		<td><?php echo ( ($exists) ? 'yes' : 'no' ); ?></td>
	</tr>
	<?php
}
?>
</table>
<?php






class link_checker {
	/******************************************
	
		Really simple link checking class,
		uses sockets and supports following
		header	redirects. This script has no
		license and comes with no warranty.
		
		By jshpro2 @ http://www.devnetwork.net
		Email jshpro2 [{at}] gmail.com
			
	******************************************/
	var $urls;
	var $error = NULL;
	
	function link_checker() {
		$this -> urls           = array();
	}
	
	function add_url($url) {
		/* Add a URL to the array */
		$this -> urls[]     = array(
			$url, false
		);
	}
	
	function check() {
		/* Check all the URLs */
		if (!$c=count($this->urls)) {
			$this -> set_error('No URLs to check');
			return(false);
		} else {
			for ($i=0;$i<$c;$i++) {
				$this->urls[$i][1] = $this -> check_page($this->urls[$i][0]);
			}
			return($this->urls);
		}
	}
	
	function check_page($url) {
		
		$exists=false;
		
		/* Get the host from this URL */
		preg_match('@(http://)?([^/]+)(.+)?@i',trim($url),$match);
		$host = $match[2];
		$url  = $match[3];

		if (!$host) {
			$this -> set_error('No host');
			return(false);
		}
		
		/* Get the port for the WWW service. */
		$service_port = getservbyname('www', 'tcp');
		
		/* Get the IP address for the target host. */
		$address = gethostbyname($host);
		if ($address == $host) {
			$this->set_error('Could not find IP for hostname');
			return(false);		
		}

		/* Create a TCP/IP socket. */
		if (!$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP)) {
		   $this -> set_error("socket_create() failed: reason: " . socket_strerror($socket) . "\n");
		   return(false);
		}
		
		/* Connect to the host */
		if (!$result = socket_connect($socket, $address, $service_port)) {
		   $this -> set_error("socket_connect() failed.\nReason: ($result) " . socket_strerror($result) . "\n");
		   return(false);
		}

		$in   = "HEAD ".(($url) ? $url : "/")." HTTP/1.1\r\n";
		$in  .= "Host: ".$host."\r\n";
		$in  .= "Connection: Close\r\n\r\n";
		$out  = '';
		$read = '';
		socket_write($socket, $in, strlen($in));
		
		while($read = socket_read($socket, 2048)) {
			$out .= $read;
		}
		$headers = preg_split('@\r\n|\r|\n@',$out);
		
		foreach($headers as $header) {
			
			/* Break down the header */
			$header = explode(':',$header,2);
			
			switch(strtolower(trim($header[0]))) {
				case 'location':
					return(trim($this->check_page($header[1])));
				break;
			}
			
			$exists = preg_match('@200@',$header[0]);
			if ($exists) break;

		}
		
		return($exists);
	}
	
	function set_error($error) {
		echo $error;
		$this -> error = $error;
	}
	
	function get_error() {
		return($this -> error);
	}
}
?>

Posted: Sun Feb 12, 2006 5:12 pm
by Shendemiar
Cool. I give more feedback when i've tried it.