Page 1 of 1

fsockopen website scanner help

Posted: Wed Mar 18, 2009 9:04 pm
by Jezza
G'day i've been trying to make a website scanner grabbing the HTML code and figeting with it which has been a success BUT some webpages dont show the proper HTML code like "Permission denied" or some error "Forbidden","Page not found" etc.

how do i, with PHP, get HTML from a website in the way identical to an internet browser so it treats it the same, know what i mean? at the moment im opening an fsockopen just say to http://www.google.com and i would

Code: Select all

   $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: http://www.google.comr\n";
    $out .= "Connection: Close\r\n\r\n";
    fwrite($fp, $out);
please help

Re: fsockopen website scanner help

Posted: Thu Mar 19, 2009 4:07 am
by php_east
that would be using cURL.

Re: fsockopen website scanner help

Posted: Thu Mar 19, 2009 6:22 am
by Chris Corbyn
I should note that this is probably illegal with most sites (check their terms of use). And creating such a tool to scrape a full web page and modify it slightly (or fidget with it as you put it) is questionable as to your intentions. Such tools are used by scammers to steal bank account login details etc.

What's your intended usage, and do you have permission from the website owners to do this?

Re: fsockopen website scanner help

Posted: Thu Mar 19, 2009 6:33 am
by php_east
i don't think there is anything illegal about scanning. that would be what google yahoo msn and all other search engines do, by way of bots. the term used is crawling. same thing. that is what the guy is asking.

Re: fsockopen website scanner help

Posted: Thu Mar 19, 2009 6:39 am
by php_east
on perhaps a related subject, google announed a few days ago that they will commence what they call interest based advertising. the web pages you browse will be recorded/taken as your personal preference, and such type of adverts will be shown to you. this forms a fingerprint of your personal liking. now what can i do about *that* ?

Re: fsockopen website scanner help

Posted: Thu Mar 19, 2009 6:01 pm
by Jezza
First off by fidget i mean it first off replaces < with < and > with > so you can see the code on the page, it replaces \r\n with <BR> so its easier to read, and it tells me the number of links and images on the page, also tells me if the page wants to plant cookies onto my computer, it tells me if it thinks this site is dangerous, and it tells me if it thinks the webpage is controlled by php, also if it finds a link "/folder/whatever.htm" it will correct it to make the link absolute, and display them. It shouldn't be illegal i'd say, would curl help me go through websites grabbing their source code better than fsockopen? I have no intention for illegal activity, its more like a site adviser, this thing is a work in progress and will soon scan the entire site with one address and give statistics on if it thinks its safe

Re: fsockopen website scanner help

Posted: Fri Mar 20, 2009 1:08 am
by php_east
Jezza wrote:would curl help me go through websites grabbing their source code better than fsockopen?
wether it is better is something you decide. cURL was intended to be easier method to fetch html, and go posts and gets and file uploads. it is easier to use in the sense that you no longer deal with sockets, you deal with a wrapper. in fact, you should think of it as a sockets wrapper.
it saves many lines of codes.

basically it provides you with a scriptable browser.