Page 1 of 1

follow a url?

Posted: Sun Jul 23, 2006 7:55 pm
by GeXus
Is it posible with cURL or something else that would allow you to type in a URL and see all of the pages that url redirects to?

Posted: Sun Jul 23, 2006 7:57 pm
by feyd
Yes. Can you be more specific?

Posted: Sun Jul 23, 2006 8:54 pm
by GeXus
Well, basically like how tracking URLs will redirect several times before you actually get to the "real" URL... this would basically show each step or each page being accessed untill it reaches the "real" or last url without redirects.

Posted: Sun Jul 23, 2006 8:58 pm
by John Cartwright
hmm.. this is kind of a long shot (I know feyd has something better in mind) but try

1. turning off followurl to cancel any auto redirects
2. record the header location
3. send a new request to that location
4. record any furthur headers sent
5. repeat until no location headers present

This is assuming they are not using meta redirects.

Posted: Mon Jul 24, 2006 3:46 pm
by GeXus
Jcart wrote:hmm.. this is kind of a long shot (I know feyd has something better in mind) but try

1. turning off followurl to cancel any auto redirects
2. record the header location
3. send a new request to that location
4. record any furthur headers sent
5. repeat until no location headers present

This is assuming they are not using meta redirects.
Hmm.. the only problem is that potentially the redirect url could be different if you hit the url exactly.. such as sending a new request to the new location....

Posted: Mon Jul 24, 2006 3:48 pm
by feyd
no, it's not.

Posted: Mon Jul 24, 2006 4:00 pm
by jamiel
You can pass parameters and options to wget to get it to follow links and url's for specified domains.

Posted: Mon Jul 24, 2006 5:12 pm
by bokehman
Sorry if this is a bit untidy but I only had 5 minutes to spare. Returns an array of URLs on success or false on failure.

Code: Select all

<?php

function list_redirects($url)
{
	static $location;
	$location = is_array($location) ? $location : array();
	$url_parsed = parse_url($url);
	extract($url_parsed);
	if (!@$scheme)
	{
		$url = 'http://'.$url;
		$url_parsed = parse_url($url);
	}
	$location[] = $url;
	extract($url_parsed);
	if(!@$port) $port = 80;
	if(!@$path) $path = '/';
	if(@$query) $path .= '?'.$query;
	$out = "HEAD $path HTTP/1.0\r\n";
	$out .= "Host: $host\r\n";
	$out .= "Connection: Close\r\n\r\n";
	if(!$fp = @fsockopen($host, $port, $es, $en, 5))
	{
		return false;
	}
	fwrite($fp, $out);
	while (!feof($fp)) 
	{
		$s = fgets($fp, 128);
		if(preg_match('/^Location:/', $s) !== 0)
		{
			fclose($fp);
			return list_redirects(trim(preg_replace("/Location:/i", "", $s)));
		}
		if(preg_match('/^HTTP(.*?)200/i', $s))
		{
			fclose($fp);
			return $location;
		}
	}
	fclose($fp);
	return false;
}
   
?>

Posted: Mon Jul 24, 2006 10:23 pm
by GeXus
bokehman wrote:Sorry if this is a bit untidy but I only had 5 minutes to spare. Returns an array of URLs on success or false on failure.

Code: Select all

<?php

function list_redirects($url)
{
	static $location;
	$location = is_array($location) ? $location : array();
	$url_parsed = parse_url($url);
	extract($url_parsed);
	if (!@$scheme)
	{
		$url = 'http://'.$url;
		$url_parsed = parse_url($url);
	}
	$location[] = $url;
	extract($url_parsed);
	if(!@$port) $port = 80;
	if(!@$path) $path = '/';
	if(@$query) $path .= '?'.$query;
	$out = "HEAD $path HTTP/1.0\r\n";
	$out .= "Host: $host\r\n";
	$out .= "Connection: Close\r\n\r\n";
	if(!$fp = @fsockopen($host, $port, $es, $en, 5))
	{
		return false;
	}
	fwrite($fp, $out);
	while (!feof($fp)) 
	{
		$s = fgets($fp, 128);
		if(preg_match('/^Location:/', $s) !== 0)
		{
			fclose($fp);
			return list_redirects(trim(preg_replace("/Location:/i", "", $s)));
		}
		if(preg_match('/^HTTP(.*?)200/i', $s))
		{
			fclose($fp);
			return $location;
		}
	}
	fclose($fp);
	return false;
}
   
?>

Nice! Thanks, I wasnt expecting all that.. seems to work great.... except it won't catch javascript redirects.. hmm.