Page 1 of 1

Socket Programming.Retrieving Webpages. Help please

Posted: Sun Jan 11, 2004 6:07 pm
by senin
I am trying to write a program that needs to connect to multiple webpages. I currently use file() or file_get_contents() in the script.

To increase the speed of this script I'm planning on using socket_set_nonblock. So that I'm able to retrieve multiple pages at the same time.

I've been reading about socket programming with PHP. Seeing how it seems as though using those functions will be the only way I'll be able to use socket_set_unblock.

However, I have not been able to find many Good resources discussing the SPECIFICS on HTTP commands and replies. I have also been unable to retrieve a full web page with socket_read().

Any help would be appreciated. Especially on retrieving a webpage with socket_read() etc.

Thanks in advance.

Sample Code below

Code: Select all

$host = "www.google.com";





$socket = socket_create(AF_INET, SOCK_STREAM, 0) or die("Could Not Create Socket\n");



socket_connect($socket, $host, 80) or die("Could Not Connect to $host");



$output = "GET / HTTP/1.1\r\n";

$output .= "Host: http://www.google.com\r\n";

$output .= "Connection: Close\r\n\r\n";



socket_write($socket, $output, strlen($output) ) or die("Could Not write to socket");



$data = socket_read($socket, 4096) ;



socket_close($socket);

print $data;

?>

Re: Socket Programming.Retrieving Webpages. Help please

Posted: Sun Jan 11, 2004 6:42 pm
by Pyrite
senin wrote: However, I have not been able to find many Good resources discussing the SPECIFICS on HTTP commands and replies.
The HTTP RFC is what you want to read.
http://www.w3.org/Protocols/rfc2616/rfc2616.html

GET is used for the HTML output and HEAD for headers.

telnet http://www.google.com 80
GET / HTTP/1.1 (then hit enter twice)

Will retreive the output of the index page.

Likewise HEAD / HTTP/1.1 (enter twice)
Will show you the Headers.

I haven't done much HTTP requests with PHP, but I hope this info helps.