file_get_contents issue

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

file_get_contents issue

Post by SidewinderX »

im having a problem with file_get_contents on a particular URL, for example:

Code: Select all

<?php
$url = "http://www.google.com";
$content = file_get_contents($url);
echo $content;
?>
works fine but,

Code: Select all

<?php
$url = "https://www.novaworld.com";
$content = file_get_contents($url);
echo $content;
?>
does not. Ive had an issue with this before, but that was because https wasnt a registered stream (but now it is). I thought it may have been because my IP was blocked from the site or something, but it also dosnt work on a different server i tested it on.

Any ideas?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

They may filter the user-agent in requests. From an earlier post:
feyd wrote:They've chosen to basically be jerks and filter what user-agent's they will allow. Using cURL, I can easily get the page. The following works as well.

Code: Select all

[feyd@home]>php -r "ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6'); var_dump(get_headers('http://www.goalzz.com/main.aspx?region=-1&area=6&update=true'));"
array(11) {
  [0]=>
  string(15) "HTTP/1.1 200 OK"
  [1]=>
  string(17) "Connection: close"
  [2]=>
  string(35) "Date: Wed, 09 Aug 2006 18:16:32 GMT"
  [3]=>
  string(25) "Server: Microsoft-IIS/6.0"
  [4]=>
  string(21) "X-Powered-By: ASP.NET"
  [5]=>
  string(26) "X-AspNet-Version: 1.1.4322"
  [6]=>
  string(62) "Set-Cookie: ASP.NET_SessionId=cpbavi551sri2w453kvvrx45; path=/"
  [7]=>
  string(22) "Cache-Control: private"
  [8]=>
  string(38) "Expires: Tue, 09 Aug 2005 18:16:32 GMT"
  [9]=>
  string(45) "Content-Type: text/html; charset=Windows-1252"
  [10]=>
  string(21) "Content-Length: 78043"
}
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

Ill look into that, but what dosnt make sense is a filter...is a filter... If it dosnt work from my remote server because it is filtered, why would it work from my local server?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

SidewinderX wrote:If it dosnt work from my remote server because it is filtered, why would it work from my local server?
You didn't say anything about your local server working that I saw.
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

feyd wrote:
SidewinderX wrote:If it dosnt work from my remote server because it is filtered, why would it work from my local server?
You didn't say anything about your local server working that I saw.
woops forgot to add that :oops:

Ill check out curl and see what that yields.
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

Code: Select all

<?php
// create a new curl resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);

// close curl resource, and free up system resources
curl_close($ch);
?>
dont work either, and when i do:
#curl https://www.novaworld.com/Players/Stats ... 1&p=616065
on my linux box i get:
<html><head><title>Object moved</title></head><body><ht>Object moved to <a href='/Players/Search.aspx'>here</a>.</h2></body></html>
perhaps it is a cookie issue?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Have you tried it with a different user-agent header than the default? It's possible to be a cookie issue, but would require them to have you go through at least two pages before they would know if cookies worked or not.
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

feyd wrote:Have you tried it with a different user-agent header than the default?
Not yet, i have no idea how/what to do. I figured id try curl first since I knew what to do. Ill look into the user-agent header stuff now.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

curl_setopt() and CURLOPT_USERAGENT
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

ok so setting my own user agent header things didnt work

Code: Select all

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT,  "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); 
curl_exec($ch);
curl_close($ch);
?>
however using a proxy did

Code: Select all

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_PROXY, '193.194.69.66:8080');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); 
curl_exec($ch);
curl_close($ch);
?>
but i dont like using the proxy, it takes much longer. I contacted my host and asked them if they blocked using outgoing secure connections and they requested my code so that reply is pending. All else aside, is there ANY other way than using an ugly proxy connection?
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

SidewinderX wrote:dont work either, and when i do:
#curl https://www.novaworld.com/Players/Stats ... 1&p=616065
on my linux box i get:
<html><head><title>Object moved</title></head><body><ht>Object moved to <a href='/Players/Search.aspx'>here</a>.</h2></body></html>
Looks like they've got a 301 redirect set up. cURL can follow them (if you tell it to), file_get_contents() probably can't.
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

After some more testing, I found out that I was having an issue with connecting to all https websites. I emailed my host and they replyed:
Hi,

Please check your script now. We have opened outgoing connections to 443 on the server.
Thank you.
Now I have no trouble connection to other https sites, but i still have a problem connecting to this one.
Current Code:

Code: Select all

<?php
// create a new curl resource
$ch = curl_init();

// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "https://www.novaworld.com/Players/Stats.aspx?id=33680801261&p=616065");
curl_setopt($ch, CURLOPT_USERAGENT,  "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_HEADER, 0);

// grab URL and pass it to the browser
curl_exec($ch);
// close curl resource, and free up system resources
curl_close($ch);
?>
Also note, it works fine behind a proxy :? :?:
SidewinderX
Forum Contributor
Posts: 407
Joined: Fri Jul 16, 2004 9:04 pm
Location: NY

Post by SidewinderX »

is there any way i can test to check if my server ip was banned from the site other than emailing the website tech support?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If it was banned specifically by them, there is no other way then to ask them.
Post Reply