Page 1 of 1
Comparing paths : Part 2 (super challenge)
Posted: Mon Jun 18, 2007 12:08 am
by alex.barylski
Ok so it shouldn't be that difficult but I've been stuck on it long enough and have given up.
Given a file:
Given a path:
Code: Select all
/opt/lampp/htdocs/articles/some_terrible_page.html
What is the *best* algorithm for determining whether a file sits inside the path. Ideally:
1) It's easy to read and understand
2) Fast and efficient
I've consider both comparing strings and as well as breaking them into arrays and comparing them iteratively that way.
I've been working on this for going on three hours now, so maybe I'm mush, but I've learned the following.
1) If the file path string length (minus the file name) is greater than the path string length, you can assume the file IS NOT inside the directory
2) If you stare at code for to long you begin to crazy and chew your nails.
So given the assesment or my understanding above, how would best implement this simple determination?
Cheers

Posted: Mon Jun 18, 2007 12:19 am
by Benjamin
Is something like this what you mean?
Code: Select all
if (preg_match('#^' . dirname($sub_path) . '.*#i', $main_path))
{
}
You would really only need to check against the FIRST folder though if I understand you correctly.
Posted: Mon Jun 18, 2007 12:25 am
by feyd
I'm failing to see why this needed a second thread...

Posted: Mon Jun 18, 2007 12:35 am
by Benjamin
I threw this together real quick, if nothing else it will give you some ideas. Sorry no unit tests.
Code: Select all
function in_path($path_one, $path_two)
{
$path_tree = explode('/', trim(dirname($path_one), '/'));
if (count($path_tree) > 0)
{
if ($path_tree[0] == '' && count($path_tree) < 2) return true;
if (preg_match('#^' . $path_tree[0] . '.*#i', $path_two)) return true;
if (isset($path_tree[1]))
{
if (preg_match('#^' . $path_tree[1] . '.*#i', $path_two)) return true;
}
}
return false;
}
Edit: I think I misunderstood you and wrote the code backwards. I'm tired. I would strip off the root paths from each path, then you only need to compare the next first directory, which should work with the function above.
Posted: Mon Jun 18, 2007 12:41 am
by alex.barylski
feyd wrote:I'm failing to see why this needed a second thread...

Well because I am requesting an actual implementation, not a theoretical disscussion so much.
astions the problem is, I really dislike regex. I would like the code to be a little more explicit in rules, as in PHP conditionals possibly.
1) Ease of reading
2) Speed in execution
Native PHP I think would win hands down in both. As for just checking the first folder, yes that is correct, but if they are the same, you would need to check the next one. Actually that just made me think.
If all I did was iteratively check each folder the minute there is a discrepancy before the filename is reached (last element in file array) I believe that should work...arrrrgh....I"m so tired of this one little problem. Thats sucks about working alone...

Posted: Mon Jun 18, 2007 12:44 am
by Benjamin
Stripping off the folders you DO NOT need to check from the beginning of the strings and then only comparing the first DIR left in each string with the function I wrote is probably the fastest way to do it. Is that possible?
Posted: Mon Jun 18, 2007 12:54 am
by Benjamin
I'm not sure what your really after so I'm kind of just throwing stuff out here.
This code will return a path up to the last matching directory in each string... without any regex.
Code: Select all
$len_one = strlen($path_one);
$len_two = strlen($path_two);
$temp = '';
$dir = '';
for ($i = 0; $i < $len_one, $i < $len_two; $i++)
{
if ($path_one{$i} == $path_two{$i})
{
$temp .= $path_one{$i};
if ($path_one{$i} == '/')
{
$dir .= $temp;
$temp = '';
}
} else {
break;
}
}
So..
/home/user_name/site/something/blah compared to /home/user_name/www/something would return /home/user_name/
Posted: Mon Jun 18, 2007 1:08 am
by alex.barylski
astions wrote:I'm not sure what your really after so I'm kind of just throwing stuff out here.
This code will return a path up to the last matching directory in each string... without any regex.
Code: Select all
$len_one = strlen($path_one);
$len_two = strlen($path_two);
$temp = '';
$dir = '';
for ($i = 0; $i < $len_one, $i < $len_two; $i++)
{
if ($path_one{$i} == $path_two{$i})
{
$temp .= $path_one{$i};
if ($path_one{$i} == '/')
{
$dir .= $temp;
$temp = '';
}
} else {
break;
}
}
So..
/home/user_name/site/something/blah compared to /home/user_name/www/something would return /home/user_name/
Cool, thanks for that...I'll certainly look at it better tomorrow morning when I awake refreshed...I'm beat

Posted: Mon Jun 18, 2007 1:16 am
by Weirdan
wouldn't something simple work:
Code: Select all
$file = '/usr/local/apache/htdocs/files/file.html';
$dir = '/some/dir/';
if (substr(realpath(dirname($file)), 0 ,strlen(realpath($dir))) == realpath($dir)) {
// in dir
} else {
// out of the dir
}
Posted: Mon Jun 18, 2007 1:52 pm
by alex.barylski
I did consider that...and I like its simplicity but...
Something is making me think there is a caveat to using this simple technique
Code: Select all
$file = '/var/www/htdocs/somedir/text.dat'; // Chop the filename using dirname()
$path = '/var/www/htdocs/'; // Inside directory
$path = '/var/www/htdocs/somedir2'; // Inside directory - despite not being
I would have to ensure that after chopping the filename from $file the directory had a trailing slash to indicate the directory name stops there.
Otherwise the above may be TRUE which is obviously not valid.
Code: Select all
$file = '/var/www/htdocs/somedir'; // Match: which is incorrect
$file = '/var/www/htdocs/somedir/'; // No match: which is correct
$path = '/var/www/htdocs/somedir2/';
echo strpos($path, $file, 0); // Would return -1 and not zero?
So yes, indeed this technique should work I just need to ensure the trailing slash is present on $file at least so as to ensure the above doesn't occur.
Any other caveats anyone can think of to using this approach?