Comparing paths : Part 2 (super challenge)

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Comparing paths : Part 2 (super challenge)

Post by alex.barylski »

Ok so it shouldn't be that difficult but I've been stuck on it long enough and have given up.

Given a file:

Code: Select all

/opt/lampp/htdocs/test_12.html
Given a path:

Code: Select all

/opt/lampp/htdocs/articles/some_terrible_page.html
What is the *best* algorithm for determining whether a file sits inside the path. Ideally:

1) It's easy to read and understand
2) Fast and efficient

I've consider both comparing strings and as well as breaking them into arrays and comparing them iteratively that way.

I've been working on this for going on three hours now, so maybe I'm mush, but I've learned the following.

1) If the file path string length (minus the file name) is greater than the path string length, you can assume the file IS NOT inside the directory
2) If you stare at code for to long you begin to crazy and chew your nails.

So given the assesment or my understanding above, how would best implement this simple determination?

Cheers :)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Is something like this what you mean?

Code: Select all

if (preg_match('#^' . dirname($sub_path) . '.*#i', $main_path))
{
	
}
You would really only need to check against the FIRST folder though if I understand you correctly.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

I'm failing to see why this needed a second thread... :?
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

I threw this together real quick, if nothing else it will give you some ideas. Sorry no unit tests. :P

Code: Select all

function in_path($path_one, $path_two)
{
    $path_tree = explode('/', trim(dirname($path_one), '/'));
    
    if (count($path_tree) > 0)
    {
        if ($path_tree[0] == '' && count($path_tree) < 2) return true;

        if (preg_match('#^' . $path_tree[0] . '.*#i', $path_two)) return true;

        if (isset($path_tree[1]))
        {
            if (preg_match('#^' . $path_tree[1] . '.*#i', $path_two)) return true;
        }        
       
    }
    return false;
}
Edit: I think I misunderstood you and wrote the code backwards. I'm tired. I would strip off the root paths from each path, then you only need to compare the next first directory, which should work with the function above.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

feyd wrote:I'm failing to see why this needed a second thread... :?
Well because I am requesting an actual implementation, not a theoretical disscussion so much. :)

astions the problem is, I really dislike regex. I would like the code to be a little more explicit in rules, as in PHP conditionals possibly.

1) Ease of reading
2) Speed in execution

Native PHP I think would win hands down in both. As for just checking the first folder, yes that is correct, but if they are the same, you would need to check the next one. Actually that just made me think. :?

If all I did was iteratively check each folder the minute there is a discrepancy before the filename is reached (last element in file array) I believe that should work...arrrrgh....I"m so tired of this one little problem. Thats sucks about working alone... :(
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

Stripping off the folders you DO NOT need to check from the beginning of the strings and then only comparing the first DIR left in each string with the function I wrote is probably the fastest way to do it. Is that possible?
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Post by Benjamin »

I'm not sure what your really after so I'm kind of just throwing stuff out here.

This code will return a path up to the last matching directory in each string... without any regex.

Code: Select all

$len_one = strlen($path_one);
$len_two = strlen($path_two);

$temp  = '';
$dir      = '';

for ($i = 0; $i < $len_one, $i < $len_two; $i++)
{
	if ($path_one{$i} == $path_two{$i})
    {
    	$temp .= $path_one{$i};
        
        if ($path_one{$i} == '/')
        {
        	$dir .= $temp;
            $temp = '';
        }
    } else {
    	break;
    }
}
So..

/home/user_name/site/something/blah compared to /home/user_name/www/something would return /home/user_name/
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

astions wrote:I'm not sure what your really after so I'm kind of just throwing stuff out here.

This code will return a path up to the last matching directory in each string... without any regex.

Code: Select all

$len_one = strlen($path_one);
$len_two = strlen($path_two);

$temp  = '';
$dir      = '';

for ($i = 0; $i < $len_one, $i < $len_two; $i++)
{
	if ($path_one{$i} == $path_two{$i})
    {
    	$temp .= $path_one{$i};
        
        if ($path_one{$i} == '/')
        {
        	$dir .= $temp;
            $temp = '';
        }
    } else {
    	break;
    }
}
So..

/home/user_name/site/something/blah compared to /home/user_name/www/something would return /home/user_name/
Cool, thanks for that...I'll certainly look at it better tomorrow morning when I awake refreshed...I'm beat :P
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Post by Weirdan »

wouldn't something simple work:

Code: Select all

$file = '/usr/local/apache/htdocs/files/file.html';
$dir = '/some/dir/';
if (substr(realpath(dirname($file)), 0 ,strlen(realpath($dir))) == realpath($dir)) {
   // in dir
} else {
   // out of the dir
}
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Post by alex.barylski »

I did consider that...and I like its simplicity but...

Something is making me think there is a caveat to using this simple technique

Code: Select all

$file = '/var/www/htdocs/somedir/text.dat'; // Chop the filename using dirname()

$path = '/var/www/htdocs/'; // Inside directory
$path = '/var/www/htdocs/somedir2'; // Inside directory - despite not being
I would have to ensure that after chopping the filename from $file the directory had a trailing slash to indicate the directory name stops there.

Otherwise the above may be TRUE which is obviously not valid.

Code: Select all

$file = '/var/www/htdocs/somedir'; // Match: which is incorrect 
$file = '/var/www/htdocs/somedir/'; // No match: which is correct 

$path = '/var/www/htdocs/somedir2/';

echo strpos($path, $file, 0); // Would return -1 and not zero?
So yes, indeed this technique should work I just need to ensure the trailing slash is present on $file at least so as to ensure the above doesn't occur.

Any other caveats anyone can think of to using this approach?
Post Reply