fsockopen website scanner help

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
Jezza
Forum Newbie
Posts: 6
Joined: Fri Feb 06, 2009 9:41 pm

fsockopen website scanner help

Post by Jezza »

G'day i've been trying to make a website scanner grabbing the HTML code and figeting with it which has been a success BUT some webpages dont show the proper HTML code like "Permission denied" or some error "Forbidden","Page not found" etc.

how do i, with PHP, get HTML from a website in the way identical to an internet browser so it treats it the same, know what i mean? at the moment im opening an fsockopen just say to http://www.google.com and i would

Code: Select all

   $out = "GET / HTTP/1.1\r\n";
    $out .= "Host: http://www.google.comr\n";
    $out .= "Connection: Close\r\n\r\n";
    fwrite($fp, $out);
please help
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: fsockopen website scanner help

Post by php_east »

that would be using cURL.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Re: fsockopen website scanner help

Post by Chris Corbyn »

I should note that this is probably illegal with most sites (check their terms of use). And creating such a tool to scrape a full web page and modify it slightly (or fidget with it as you put it) is questionable as to your intentions. Such tools are used by scammers to steal bank account login details etc.

What's your intended usage, and do you have permission from the website owners to do this?
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: fsockopen website scanner help

Post by php_east »

i don't think there is anything illegal about scanning. that would be what google yahoo msn and all other search engines do, by way of bots. the term used is crawling. same thing. that is what the guy is asking.
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: fsockopen website scanner help

Post by php_east »

on perhaps a related subject, google announed a few days ago that they will commence what they call interest based advertising. the web pages you browse will be recorded/taken as your personal preference, and such type of adverts will be shown to you. this forms a fingerprint of your personal liking. now what can i do about *that* ?
Jezza
Forum Newbie
Posts: 6
Joined: Fri Feb 06, 2009 9:41 pm

Re: fsockopen website scanner help

Post by Jezza »

First off by fidget i mean it first off replaces < with < and > with > so you can see the code on the page, it replaces \r\n with <BR> so its easier to read, and it tells me the number of links and images on the page, also tells me if the page wants to plant cookies onto my computer, it tells me if it thinks this site is dangerous, and it tells me if it thinks the webpage is controlled by php, also if it finds a link "/folder/whatever.htm" it will correct it to make the link absolute, and display them. It shouldn't be illegal i'd say, would curl help me go through websites grabbing their source code better than fsockopen? I have no intention for illegal activity, its more like a site adviser, this thing is a work in progress and will soon scan the entire site with one address and give statistics on if it thinks its safe
User avatar
php_east
Forum Contributor
Posts: 453
Joined: Sun Feb 22, 2009 1:31 pm
Location: Far Far East.

Re: fsockopen website scanner help

Post by php_east »

Jezza wrote:would curl help me go through websites grabbing their source code better than fsockopen?
wether it is better is something you decide. cURL was intended to be easier method to fetch html, and go posts and gets and file uploads. it is easier to use in the sense that you no longer deal with sockets, you deal with a wrapper. in fact, you should think of it as a sockets wrapper.
it saves many lines of codes.

basically it provides you with a scriptable browser.
Post Reply