Page 1 of 1

file_get_contents issue

Posted: Tue Aug 15, 2006 4:03 pm
by SidewinderX
im having a problem with file_get_contents on a particular URL, for example:

Code: Select all

<?php
$url = "http://www.google.com";
$content = file_get_contents($url);
echo $content;
?>
works fine but,

Code: Select all

<?php
$url = "https://www.novaworld.com";
$content = file_get_contents($url);
echo $content;
?>
does not. Ive had an issue with this before, but that was because https wasnt a registered stream (but now it is). I thought it may have been because my IP was blocked from the site or something, but it also dosnt work on a different server i tested it on.

Any ideas?

Posted: Tue Aug 15, 2006 4:13 pm
by feyd
They may filter the user-agent in requests. From an earlier post:
feyd wrote:They've chosen to basically be jerks and filter what user-agent's they will allow. Using cURL, I can easily get the page. The following works as well.

Code: Select all

[feyd@home]>php -r "ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6'); var_dump(get_headers('http://www.goalzz.com/main.aspx?region=-1&area=6&update=true'));"
array(11) {
  [0]=>
  string(15) "HTTP/1.1 200 OK"
  [1]=>
  string(17) "Connection: close"
  [2]=>
  string(35) "Date: Wed, 09 Aug 2006 18:16:32 GMT"
  [3]=>
  string(25) "Server: Microsoft-IIS/6.0"
  [4]=>
  string(21) "X-Powered-By: ASP.NET"
  [5]=>
  string(26) "X-AspNet-Version: 1.1.4322"
  [6]=>
  string(62) "Set-Cookie: ASP.NET_SessionId=cpbavi551sri2w453kvvrx45; path=/"
  [7]=>
  string(22) "Cache-Control: private"
  [8]=>
  string(38) "Expires: Tue, 09 Aug 2005 18:16:32 GMT"
  [9]=>
  string(45) "Content-Type: text/html; charset=Windows-1252"
  [10]=>
  string(21) "Content-Length: 78043"
}

Posted: Tue Aug 15, 2006 4:30 pm
by SidewinderX
Ill look into that, but what dosnt make sense is a filter...is a filter... If it dosnt work from my remote server because it is filtered, why would it work from my local server?

Posted: Tue Aug 15, 2006 4:40 pm
by feyd
SidewinderX wrote:If it dosnt work from my remote server because it is filtered, why would it work from my local server?
You didn't say anything about your local server working that I saw.

Posted: Tue Aug 15, 2006 5:32 pm
by SidewinderX
feyd wrote:
SidewinderX wrote:If it dosnt work from my remote server because it is filtered, why would it work from my local server?
You didn't say anything about your local server working that I saw.
woops forgot to add that :oops:

Ill check out curl and see what that yields.

Posted: Tue Aug 15, 2006 5:56 pm
by SidewinderX

Code: Select all

<?php
// create a new curl resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close curl resource, and free up system resources
curl_close($ch);
?>
dont work either, and when i do:
#curl https://www.novaworld.com/Players/Stats ... 1&p=616065
on my linux box i get:
<html><head><title>Object moved</title></head><body><ht>Object moved to <a href='/Players/Search.aspx'>here</a>.</h2></body></html>
perhaps it is a cookie issue?

Posted: Tue Aug 15, 2006 6:10 pm
by feyd
Have you tried it with a different user-agent header than the default? It's possible to be a cookie issue, but would require them to have you go through at least two pages before they would know if cookies worked or not.

Posted: Tue Aug 15, 2006 6:19 pm
by SidewinderX
feyd wrote:Have you tried it with a different user-agent header than the default?
Not yet, i have no idea how/what to do. I figured id try curl first since I knew what to do. Ill look into the user-agent header stuff now.

Posted: Tue Aug 15, 2006 6:54 pm
by Ambush Commander
curl_setopt() and CURLOPT_USERAGENT

Posted: Wed Aug 16, 2006 12:13 am
by SidewinderX
ok so setting my own user agent header things didnt work

Code: Select all

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT,  "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); 
curl_exec($ch);
curl_close($ch);
?>
however using a proxy did

Code: Select all

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '193.194.69.66:8080');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_exec($ch);
curl_close($ch);
?>
but i dont like using the proxy, it takes much longer. I contacted my host and asked them if they blocked using outgoing secure connections and they requested my code so that reply is pending. All else aside, is there ANY other way than using an ugly proxy connection?

Posted: Wed Aug 16, 2006 2:40 am
by onion2k
SidewinderX wrote:dont work either, and when i do:
#curl https://www.novaworld.com/Players/Stats ... 1&p=616065
on my linux box i get:
<html><head><title>Object moved</title></head><body><ht>Object moved to <a href='/Players/Search.aspx'>here</a>.</h2></body></html>
Looks like they've got a 301 redirect set up. cURL can follow them (if you tell it to), file_get_contents() probably can't.

Posted: Wed Aug 16, 2006 12:35 pm
by SidewinderX
After some more testing, I found out that I was having an issue with connecting to all https websites. I emailed my host and they replyed:
Hi,

Please check your script now. We have opened outgoing connections to 443 on the server.
Thank you.
Now I have no trouble connection to other https sites, but i still have a problem connecting to this one.
Current Code:

Code: Select all

<?php
// create a new curl resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065");
curl_setopt($ch, CURLOPT_USERAGENT,  "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);
// close curl resource, and free up system resources
curl_close($ch);
?>
Also note, it works fine behind a proxy :? :?:

Posted: Wed Aug 16, 2006 4:58 pm
by SidewinderX
is there any way i can test to check if my server ip was banned from the site other than emailing the website tech support?

Posted: Wed Aug 16, 2006 5:01 pm
by feyd
If it was banned specifically by them, there is no other way then to ask them.