Link tester

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Shendemiar
Forum Contributor
Posts: 404
Joined: Thu Jan 08, 2004 8:28 am

Link tester

Post by Shendemiar »

Does anyone know ready class or usefull sniplets for an app, that runs to check all the links under certain domain, and reports broken link to a file or by email?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Code Snippets is not a place to request snippets. :?


Moved to PHP - Code.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

Do you have your links in an array already? If not it can be done with 2 lines of code, hint: preg_match


Then all you need is a foreach loop and sockets, send a HEAD request to the URL and preg_match the response headers for 200 OK, of course this will throw other response types into the "broken link" category, if you want to only test for 404s just preg_match a "404" in the response header. You have to make sure you're looking at the right header though because you don't want to match 404 or 200 out of the content-length or some other numerically valued header.


If you're completely lost I might write something up for you later, because I need it myself as well.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

The snippet I posted for Heavy a long time ago would work as a base for doing this.

viewtopic.php?t=29312&highlight=curl+src+background
Shendemiar
Forum Contributor
Posts: 404
Joined: Thu Jan 08, 2004 8:28 am

Post by Shendemiar »

jshpro2 wrote:
If you're completely lost I might write something up for you later, because I need it myself as well.
Notify me if you do, i bet yours will be better...
matthijs
DevNet Master
Posts: 3360
Joined: Thu Oct 06, 2005 3:57 pm

Post by matthijs »

Does it have to be a php script?

I use Xenu: http://home.snafu.de/tilman/xenulink.html, a great little program. In a few seconds it checks hundreds of links, and reports all broken links, ordered by page and by link. Gives a complete sitemap with all valid urls, etc etc.

(sorry if it's not what you need)
Shendemiar
Forum Contributor
Posts: 404
Joined: Thu Jan 08, 2004 8:28 am

Post by Shendemiar »

Preferably php script so i can cron it weekly. I also need similar code to manipulate & provide extra info of links, so the code would be double usefull.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Post by josh »

Does not have SSL support, but it also does not require cURL. It fully supports header redirects, but there's no check for infinite recursion so you might want to add that.

Code: Select all

<?php

$link_checker = new link_checker();

// This is how you add a  URL to be checked
$link_checker -> add_url('http://google.com');
$link_checker -> add_url('http://www.yahoo.com/');
$link_checker -> add_url('http://www.yahoo.com/admin');
$link_checker -> add_url('http://foobar.com/admin');

// This checks all your links and returns an array of results
$result = $link_checker -> check() or die($link_checker -> get_error());


// I'm just outputting the array here, do whatever you want for this
?>
<table>
	<tr>
		<th>Page</th>
		<th>Exists</th>
	</tr>
<?php
foreach($result as $current) {
	list($page,$exists)=$current;
	?>
	<tr>
		<td><?php echo htmlentities($page,ENT_QUOTES,'UTF-8'); ?></td>
		<td><?php echo ( ($exists) ? 'yes' : 'no' ); ?></td>
	</tr>
	<?php
}
?>
</table>
<?php






class link_checker {
	/******************************************
	
		Really simple link checking class,
		uses sockets and supports following
		header	redirects. This script has no
		license and comes with no warranty.
		
		By jshpro2 @ http://www.devnetwork.net
		Email jshpro2 [{at}] gmail.com
			
	******************************************/
	var $urls;
	var $error = NULL;
	
	function link_checker() {
		$this -> urls           = array();
	}
	
	function add_url($url) {
		/* Add a URL to the array */
		$this -> urls[]     = array(
			$url, false
		);
	}
	
	function check() {
		/* Check all the URLs */
		if (!$c=count($this->urls)) {
			$this -> set_error('No URLs to check');
			return(false);
		} else {
			for ($i=0;$i<$c;$i++) {
				$this->urls[$i][1] = $this -> check_page($this->urls[$i][0]);
			}
			return($this->urls);
		}
	}
	
	function check_page($url) {
		
		$exists=false;
		
		/* Get the host from this URL */
		preg_match('@(http://)?([^/]+)(.+)?@i',trim($url),$match);
		$host = $match[2];
		$url  = $match[3];

		if (!$host) {
			$this -> set_error('No host');
			return(false);
		}
		
		/* Get the port for the WWW service. */
		$service_port = getservbyname('www', 'tcp');
		
		/* Get the IP address for the target host. */
		$address = gethostbyname($host);
		if ($address == $host) {
			$this->set_error('Could not find IP for hostname');
			return(false);		
		}

		/* Create a TCP/IP socket. */
		if (!$socket = socket_create(AF_INET, SOCK_STREAM, SOL_TCP)) {
		   $this -> set_error("socket_create() failed: reason: " . socket_strerror($socket) . "\n");
		   return(false);
		}
		
		/* Connect to the host */
		if (!$result = socket_connect($socket, $address, $service_port)) {
		   $this -> set_error("socket_connect() failed.\nReason: ($result) " . socket_strerror($result) . "\n");
		   return(false);
		}

		$in   = "HEAD ".(($url) ? $url : "/")." HTTP/1.1\r\n";
		$in  .= "Host: ".$host."\r\n";
		$in  .= "Connection: Close\r\n\r\n";
		$out  = '';
		$read = '';
		socket_write($socket, $in, strlen($in));
		
		while($read = socket_read($socket, 2048)) {
			$out .= $read;
		}
		$headers = preg_split('@\r\n|\r|\n@',$out);
		
		foreach($headers as $header) {
			
			/* Break down the header */
			$header = explode(':',$header,2);
			
			switch(strtolower(trim($header[0]))) {
				case 'location':
					return(trim($this->check_page($header[1])));
				break;
			}
			
			$exists = preg_match('@200@',$header[0]);
			if ($exists) break;

		}
		
		return($exists);
	}
	
	function set_error($error) {
		echo $error;
		$this -> error = $error;
	}
	
	function get_error() {
		return($this -> error);
	}
}
?>
Shendemiar
Forum Contributor
Posts: 404
Joined: Thu Jan 08, 2004 8:28 am

Post by Shendemiar »

Cool. I give more feedback when i've tried it.
Post Reply