Page 1 of 1

Conversion of any relative path to the full URL

Posted: Mon Nov 20, 2006 12:02 am
by dibyendrah
Dear all,
I'm working on pdf indexing and came across a small problem regarding the relative path.

Suppose I have URL which I am scanning is http://xyz.com/docs/manaul/main.html .
When scannng this files if it has links like

Code: Select all

<a href="../pdfs/manual_en.pdf">English Manual </a>
I want to convert this realtive href to full URL like http://xyz.com/docs/pdfs/manual_en.pdf

Please suggest. If this is not clear, I'll try to write more.

With Best Regards,
Dibyendra

Posted: Mon Nov 20, 2006 12:12 am
by John Cartwright
a simple str_replace() or preferably preg_match_all() to be safe incase there is a random ".." somewhere and simply to replace the relative path with the full path.

Heres a start, I suck very much at regex, hopefully this will inspire you.

Code: Select all

preg_replace('/a href="[\.]{1,2}\/([^"])/i', 'a href="http://domain.com/\\1', $file);

Posted: Mon Nov 20, 2006 12:25 am
by dibyendrah
Thank you Jcart for your reply. The main problem is that if the URL is http://domain.com/1/2/3/4/main.html and if this file has relative path ../../somefile.pdf, the output should be http://domain.com/1/2/somefile.pdf. Is there other similar solution Jcart ?

Thank you for your quick response.

Dibyendra

Posted: Mon Nov 20, 2006 1:37 am
by dibyendrah
I'm thinking to make function which will take a full url, relative path of file on that full url as parameters and traverse the full url comparing with relative path and return the actual full url to the file.

proposed function will be :

Code: Select all

function get_actual_url('http://xyz.com/a/b/c/d/d.html', '../../pdfs/b.pdf'){
.....
return actual_url_of_file
}
Any help will be appreciated.

Dibyendra

Posted: Mon Nov 20, 2006 3:54 am
by dibyendrah
Dear All,
Finally, I came across this solution in long hard way. Hope somebody can post easier way to solve this problem.

Code: Select all

<?php

function get_actual_URL($URL, $relative_path_to_URL){ 
	
	$URL = str_replace(basename($URL), "", $URL);
	$URL_array = parse_url($URL);
	$scheme = $URL_array["scheme"];
	$domain = $URL_array["host"];
	
	$directory_structure = $URL_array['path'];
	$directory_structure_parts = explode("/", $directory_structure);

	foreach ($directory_structure_parts as $key=>$value) {
		if(empty($directory_structure_parts[$key])){
			unset($directory_structure_parts[$key]);	
		}
	}
	$count_main_URL_dirs = count($directory_structure_parts);
	
	$file_URL_array = explode("/", $relative_path_to_URL);
	$array_value_count = array_count_values($file_URL_array); //to recreate the array

	$count_relative_path_dirs = $array_value_count[".."];
	$directory_structure_parts = array_values($directory_structure_parts);
	
	for($i=1; $i<=$count_relative_path_dirs; $i++){
		unset($directory_structure_parts[$count_main_URL_dirs-$i]);
	}

	$directory_structure_remake = implode("/", $directory_structure_parts);
	$append = str_replace("../", "", $relative_path_to_URL);
	$full_url = $scheme."://".$domain."/".$directory_structure_remake."/".$append;

	return ($full_url);
	
}

$URL = "http://xyz.com/a/b/c/d/d.html";
$relative_path_to_URL = "../../pdfs/b.pdf";

print get_actual_URL($URL, $relative_path_to_URL);
?>
Output :

Code: Select all

http://xyz.com/a/b/pdfs/b.pdf
With Regards,
Dibyendra

Posted: Mon Nov 20, 2006 6:06 am
by dibyendrah
This solution applies if you are reading the page from different domain rather than staying on same domain. If you are staying on same domain, realpath will help you out.

Hope this will help somebody.

With Best Regards,
Dibyendra

Posted: Mon Nov 20, 2006 7:18 am
by Ollie Saunders
I haven't examined that code you just posted dibyendrah, but this is something I wrote to do pretty much exactly what you are after:

You might want to start reading from the row of asterisks I've put in for you:

Code: Select all

/**
 * Preform a HTTP(S) redirect. Supports relative or absolute path.
 *
 * Portability features untested.
 *
 * @param string $input
 * @todo Modify this so that it is testible
 */
function OSIS_Redirect($input)
{
    static $protocol = null;
    if ($protocol === null) {
        if (empty($_SERVER['HTTPS']) || strtolower($_SERVER['HTTPS']) == 'off') {
            $protocol = 'http';
        } else {
            $protocol = 'https';
        }
    }
    static $port = null;
    if ($port === null) {
        $ports = array('https' => 443, 'http' => 80);
        if ($_SERVER['SERVER_PORT'] != $ports[$protocol]) {
            $port = ':' . $_SERVER['SERVER_PORT'];
        } else {
            $port = '';
        }
    }
    /* **************************** */
    if ($input[0] == '/') { // absolute
        $path = array('');
    } else { // relative
        $path = explode('/', dirname($_SERVER['SCRIPT_NAME']));
    }
    $input = explode('/', $input);

    foreach ($input as $part) { // resolve to absolute
        if ($part == '.' || $part == '') {
            continue;
        }
        if ($part == '..') {
            array_pop($path);
            continue;
        }
        $path[] = $part;
    }

    $path = implode('/', $path);
    header("Location: $protocol://{$_SERVER['HTTP_HOST']}$port$path");
    exit;
}

Posted: Tue Nov 21, 2006 12:35 am
by dibyendrah
Thank you so much ole for your response. I have tested your script and help me for similar purpose.

I have designed many functions for something which to convert relative path to absoulte path and absoulute path to URL & URL to absoulute path. All these are may not work perfectly as these are not so robust and may fail if given different parameters. Hope these will be useful as welll who are looking for similar solutions.

Code: Select all

//string function conv_relative_path_to_absoulute_path(string $relative_path)
//to return the absolute path from the relative path provided as parameter
function conv_relative_path_to_absoulute_path($relative_path)
{
	$realpath = realpath($relative_path);

	if(file_exists($realpath)){
		$absolute_path = str_replace($_SERVER['DOCUMENT_ROOT'], '', $realpath);
		if ($absolute_path == $realpath) {
			$server_root = str_replace('/', '\\', $_SERVER['DOCUMENT_ROOT']);
			$absolute_path = str_replace($server_root, '', $realpath);
			$absolute_path = str_replace("\\", "/", $absolute_path);
		}
		return $absolute_path;
	}

}

//string function conv_absoulute_path_to_url(string $absolute_path)
//to return the full url of the absolute path provided as parameter
function conv_absoulute_path_to_url($absolute_path){

	$server_name = $_SERVER['SERVER_NAME'];
	$full_url = $server_name . "/" . $absolute_path;
	$full_url = "http://" . $full_url;
	$full_url = str_replace("//", "/", $full_url);
	return($full_url);
}

function conv_url_to_absolute_path($url){

	$url_parts = parse_url($url);
	$domain_url = $url_parts["host"];
	$path = $url_parts["path"];
	$OS = getUserOS();
	if ($OS == "Windows") {
		$real_path = str_replace("\\", "/", $_SERVER['DOCUMENT_ROOT'] . $path);
	} elseif ($OS == "Linux") {
		$real_path = $_SERVER['DOCUMENT_ROOT'] . $path;
	}
	$real_path = str_replace("//", "/", $real_path);

	if(file_exists($real_path)){
		return($real_path);
	}else{
		return(null);
	}
}

Cheers,
Dibyendra

Posted: Tue Nov 21, 2006 2:47 am
by Ollie Saunders
Cool, I'm glad it helped.

To avoid conditions like if (windows) and if (linux) use the constant DIRECTORY_SEPARATOR which always contains the correct slash for the current OS.

Posted: Tue Nov 21, 2006 3:34 am
by dibyendrah
ole wrote:Cool, I'm glad it helped.

To avoid conditions like if (windows) and if (linux) use the constant DIRECTORY_SEPARATOR which always contains the correct slash for the current OS.
Please provide some samples which will make clear for your comments.

Thank you.

With Best Regards,
Dibyendra