determining file type of remote file
Posted: Wed Sep 24, 2003 8:07 am
I've a reciprocal link testing class that evaluates all anchor tags on a page looking for a return link. If not found on that page it tests all the found hrefs that are on the domain in question, etc etc to a defined nested depth.
This all works fine with some code for <base tags, mailtos and framesets - it follows the correct pages and stops after a set number of pages or once the linkback is found.
I figured though that it would try to access things like jpegs, exes, zips etc if they were the href of an anchor tag.
My quick workaround was to build an array of bad extensions array('.jpg','.exe','.zip'); etc and test strpos - if the extension appears in the remote url it would be ignored.
Is there a better way? Perhaps with returning a filetype text/html text/xml etc to force the script to only scan webpages? Can remote filetyping be tested and is it dependent upon the servers involved?
This all works fine with some code for <base tags, mailtos and framesets - it follows the correct pages and stops after a set number of pages or once the linkback is found.
I figured though that it would try to access things like jpegs, exes, zips etc if they were the href of an anchor tag.
My quick workaround was to build an array of bad extensions array('.jpg','.exe','.zip'); etc and test strpos - if the extension appears in the remote url it would be ignored.
Is there a better way? Perhaps with returning a filetype text/html text/xml etc to force the script to only scan webpages? Can remote filetyping be tested and is it dependent upon the servers involved?