Page 1 of 2
Browsing remote folders
Posted: Fri Oct 01, 2004 9:29 pm
by evilmonkey
Hello. I would like to make a bot that would go into a website, go through all the directories under that path, and display a list of all the files that are there. Can PHP do that? I know it can connect to remote hosts and do file_get_contents(), but can it get the list of files?
Thanks.
(P.S. Before I get accused of bieng a hacker, or something to the sort, this is in purely good intentions.)
Posted: Fri Oct 01, 2004 9:31 pm
by m3mn0n
Yep.

Posted: Fri Oct 01, 2004 9:34 pm
by evilmonkey
How?

Posted: Sat Oct 02, 2004 12:02 am
by feyd
you have to look at all the file references the pages and other returned data's give.. it's potentially a very time consuming endevour.
If you look through my "recent" posts, you'll find something about a roll-your-own proxy, where I roughly detailed out what's involved with creating a firewall quasi by-passing script.
Posted: Sat Oct 02, 2004 10:41 am
by Getran
hmm, i'm not sure, but is this kinda like what you're looking for ?
http://www.spoono.com/php/tutorials/tutorial.php?id=10
Posted: Sat Oct 02, 2004 11:09 am
by evilmonkey
Hello Gertran,
That doesn't seem to work if I set $path to an http://.../somefolder., it only seems to work for local directories...
Feyd, I'll take a look at your posts (assuming I find them

)
Posted: Sat Oct 02, 2004 11:14 am
by evilmonkey
Hello fyed,
I found your thread on bypassing a firewall, but I don't really understand how it applies to me, nor do I understand what you mean by "use regex to to switch all the external file references in the page to usable ones". What are "usable ones"? And once again, I don't really get how it applies to my situation.
Thanks for your help.

Posted: Sat Oct 02, 2004 11:22 am
by feyd
the usable ones was a referring to creating the proxy. But that part doesn't apply to you. The parts I was talking about were the regular expression matching bits, mostly. Basically, you create a search engine of sorts, that reads and finds all links in a given page, stores them and continues going through it's list until it hits a dead end (all links on the page have already been spidered). When you find a set of links that apply to the folder you want, you store those off in a seperate "results" list.
Posted: Sat Oct 02, 2004 11:30 am
by evilmonkey
Ooooh, no, that's unfortunaly not what I need. The content I want to find is not linked anywhere. It's a lone file sitting somewhere within a directory, not linked to anything, not linked by anything, and I want to find it. For that, I need to open the directory that It's in, get it's listing, then if there are directories in there, get thir listing, and so fourth, until I find that file.
Posted: Sat Oct 02, 2004 11:46 am
by feyd
if the server doesn't provide a directory listing, then you are probably out of luck.
Posted: Sat Oct 02, 2004 2:30 pm
by m3mn0n
How about FTP?
Posted: Sat Oct 02, 2004 4:40 pm
by Breckenridge
I think evilmonkey is talking about http w/o any access to the server file system through a user account. If this is the case I don't think this can be done.
Posted: Sat Oct 02, 2004 9:19 pm
by evilmonkey
Yeah, through a script runing on one server, get the directory listing of a folder on a totally different computer. Are you sure this can't be done?
Sami, FTP is out of the qeustion. I want the script to know nothing about the server that it is accessing. Just read the names of the files and folders, nothing else. Modify nothing, delete nothing, open nothing. Just the filenames.
Posted: Sat Oct 02, 2004 9:30 pm
by feyd
it's not possible unless the server allows a directory listing of the files in it.
Posted: Sat Oct 02, 2004 9:34 pm
by evilmonkey
Okay, if there's an index.htm/l/.php/.asp/whatever file, does mean dirlist is dissallowed?