Page 1 of 2

file_get_contents() will not open remote file.

Posted: Thu May 25, 2006 8:59 am
by kjcornwell
I can't get file_get_contents('http://www.google.com') to retrieve a remote page. It works fine for local files, file_get_contents("c:\file.txt"). I have all errors and warning turned on in php.ini (error_reporting = E_ALL) and yet all I get in my browser is "The connection was reset" in firefox and "The page cannot be displayed" in IE (btw, error reporting IS sent to the browser and does work in other situations). I also get the same "connection to browser was reset" message for fopen("http://www.google.com", "r"). allow_url_fopen is set to "on" and verified in phpinfo(). According to godaddy port 80 is open and the problem is in my script (btw, my 2003 firewall is disabled).

What the heck???

Here is my rig...

Server 2003
IIS 6.0
PHP Version 4.4.2
I am using the php4isapi.dll as suggested in php manual install zip.
Virtual Dedicated Account at godaddy.com.

Any suggestions/ideas?


Thanks,
Kevin C

Posted: Thu May 25, 2006 9:11 am
by xpgeek
Check option - "fopen wrappers" is on?

You code is wrong.
Try it.

Code: Select all

echo  file_get_contents('http://www.google.com');
read about wrappers here http://ua2.php.net/manual/en/wrappers.http.php

Posted: Thu May 25, 2006 9:44 am
by kjcornwell
Thanks but no help. :(

here is the Fopen Section from php.ini

Code: Select all

;;;;;;;;;;;;;;;;;;
; Fopen wrappers ;
;;;;;;;;;;;;;;;;;;

; Whether to allow the treatment of URLs (like http:// or ftp://) as files.
allow_url_fopen = 1

; Define the anonymous ftp password (your email address)
;from="john@doe.com"

; Define the User-Agent string
; user_agent="PHP"

; Default timeout for socket based streams (seconds)
default_socket_timeout = 60

; If your scripts have to deal with files from Macintosh systems,
; or you are running on a Mac and need to deal with files from
; unix or win32 systems, setting this flag will cause PHP to
; automatically detect the EOL character in those files so that
; fgets() and file() will work regardless of the source of the file.
; auto_detect_line_endings = Off
This was a message board typo...

Code: Select all

file_get_contents('http:www.google.com');
I used the proper url in my actual page.

Any other ideas?

Posted: Thu May 25, 2006 2:05 pm
by Chris Corbyn
Moved to Servers as per suggestion by OP.

Posted: Thu May 25, 2006 2:07 pm
by Chris Corbyn
Do you have a software level firewall running that could be denying PHP access?

Posted: Thu May 25, 2006 2:36 pm
by neogeek
I was running into the same issue, here is what I came up with: viewtopic.php?p=266317&highlight=#266317

Posted: Thu May 25, 2006 3:57 pm
by kjcornwell
d11wtq wrote:Do you have a software level firewall running that could be denying PHP access?
Thanks for the help.

No software firewall.

I can ping/nslookup any site from windows command line.

Posted: Thu May 25, 2006 4:00 pm
by kjcornwell
neogeek wrote:I was running into the same issue, here is what I came up with: viewtopic.php?p=266317&highlight=#266317
THanks neogeek.

It works! But why?????????

Any PHP experts out there know what the heck is going on? I'd much rather use file_get_contents(), it's much more elegant.

Posted: Thu May 25, 2006 4:04 pm
by neogeek
I have no idea, but it does. I have been able to successfully download everything I have tossed in it except for a cross-domain referer.

Posted: Thu May 25, 2006 4:06 pm
by kjcornwell
neogeek wrote:I have no idea, but it does. I have been able to successfully download everything I have tossed in it except for a cross-domain referer.
hmmm. did you do a manual install of php? What version are you running? Are you using the isapi.dll?

Posted: Thu May 25, 2006 4:13 pm
by neogeek
Well what happened was this. I was looking to download this file:

Code: Select all

http://startupguide.typepad.com/favicon.ico
And when you go to it manually it brings you to this file:

Code: Select all

http://6a.typepad.com/favicon.ico
But when you use the function that I wrote, you get this in the file that it returns:

Code: Select all

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="http://6a.typepad.com/favicon.ico">here</a>.</p>
</body></html>
I'm sure that it would be easy to find a way around this, like looking for 302 Found and doing a regex for a url in the page, but I'm going to see if there is any other more solid way of doing it than that. I see that as a last option.

Posted: Thu May 25, 2006 5:16 pm
by timvw
neogeek wrote: I'm sure that it would be easy to find a way around this, like looking for 302 Found and doing a regex for a url in the page,
Actually, all you need is the 'header', no need to parse the 'body'
timvw@madoka:~$ telnet startupguide.typepad.com 80
Trying 204.9.178.60...
Connected to typepad.com.
Escape character is '^]'.
GET /favicon.ico HTTP/1.0
Host: startupguide.typepad.com

HTTP/1.0 302 Moved Temporarily
Date: Thu, 25 May 2006 22:15:34 GMT
Server: Apache
Location: http://6a.typepad.com/favicon.ico
Content-Length: 217
Content-Type: text/html; charset=iso-8859-1
X-Cache: MISS from http://www.sixapart.com
Connection: close

[snipped body]

Posted: Thu May 25, 2006 5:35 pm
by neogeek
Thank timvw, I didn't know that the url it was redirecting to was in the header. Thus I updated the function to include a check to see if its getting the correct file.

Code: Select all

function fetch_remote_file($file) {

   $path = parse_url($file);

   $fs = @fsockopen($path['host'], 80);

   if ($fs) {

      $header  = 'GET ' . $path['path'] . ' HTTP/1.0' . "\n";
      $header .= 'Host: ' . $path['host'] . str_repeat("\n", 2);

      fwrite($fs, $header);

      $buffer = '';

      while ($tmp = fread($fs, 1024)) { $buffer .= $tmp; }

      preg_match('/Location: (.*+)/', $buffer, $matches);

      if ($matches[1] && $file != trim($matches[1])) { return fetch_remote_file(trim($matches[1])); }

      preg_match('/Content-Length: ([0-9]+)/', $buffer, $matches);

      if ($matches[1] > 0) { return substr($buffer, -$matches[1]); } else { return false; }

   } else { return false; }

}
Edit: Replaced my lnbr constant with its actual value.

Posted: Thu May 25, 2006 5:43 pm
by timvw
Actually, http/1.0 and the Host header don't go hand in hand very well... Better use the absolute URL for the request...
GET http://example.com/favicon.ico HTTP/1.0
Connection: close
And the first thing i would do when i recieve a http response is look at the status code
http/$version $status $reason
If it's 200, everything is fine.. otherwise you may need to do something extra...

(But there is no need to reinvent the wheel, since http://www.php.net/curl already does this for us...)

Posted: Thu May 25, 2006 5:48 pm
by neogeek
timvw wrote:(But there is no need to reinvent the wheel, since http://www.php.net/curl already does this for us...)
Very true. Only I wanted a surefire way to fetch files especially if the server this script was running on didn't have that library installed. (I now know that it is a common PHP library on many server, but it was still fun trying to find a solution to a problem that I though to be unsolvable)