Socket Programming.Retrieving Webpages. Help please

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
senin
Forum Newbie
Posts: 5
Joined: Fri Jan 02, 2004 5:00 pm

Socket Programming.Retrieving Webpages. Help please

Post by senin »

I am trying to write a program that needs to connect to multiple webpages. I currently use file() or file_get_contents() in the script.

To increase the speed of this script I'm planning on using socket_set_nonblock. So that I'm able to retrieve multiple pages at the same time.

I've been reading about socket programming with PHP. Seeing how it seems as though using those functions will be the only way I'll be able to use socket_set_unblock.

However, I have not been able to find many Good resources discussing the SPECIFICS on HTTP commands and replies. I have also been unable to retrieve a full web page with socket_read().

Any help would be appreciated. Especially on retrieving a webpage with socket_read() etc.

Thanks in advance.

Sample Code below

Code: Select all

$host = "www.google.com";





$socket = socket_create(AF_INET, SOCK_STREAM, 0) or die("Could Not Create Socket\n");



socket_connect($socket, $host, 80) or die("Could Not Connect to $host");



$output = "GET / HTTP/1.1\r\n";

$output .= "Host: http://www.google.com\r\n";

$output .= "Connection: Close\r\n\r\n";



socket_write($socket, $output, strlen($output) ) or die("Could Not write to socket");



$data = socket_read($socket, 4096) ;



socket_close($socket);

print $data;

?>
User avatar
Pyrite
Forum Regular
Posts: 769
Joined: Tue Sep 23, 2003 11:07 pm
Location: The Republic of Texas
Contact:

Re: Socket Programming.Retrieving Webpages. Help please

Post by Pyrite »

senin wrote: However, I have not been able to find many Good resources discussing the SPECIFICS on HTTP commands and replies.
The HTTP RFC is what you want to read.
http://www.w3.org/Protocols/rfc2616/rfc2616.html

GET is used for the HTML output and HEAD for headers.

telnet http://www.google.com 80
GET / HTTP/1.1 (then hit enter twice)

Will retreive the output of the index page.

Likewise HEAD / HTTP/1.1 (enter twice)
Will show you the Headers.

I haven't done much HTTP requests with PHP, but I hope this info helps.
Post Reply