Page 1 of 2
How to prevent my website being retrieved by someone else ?
Posted: Wed Aug 09, 2006 10:26 am
by christian_phpbeginner
Hi, could you please help me how can I prevent someone else using file_get_contents() to retrieve my page ?
Thanks a lot,
Chris
Posted: Wed Aug 09, 2006 10:38 am
by feyd
Basic answer, you can't and you shouldn't.
Posted: Wed Aug 09, 2006 11:28 am
by Chris Corbyn
Yeah, why would you want to do that? If you make a website you're making something accessible to everyone.
Some servers check for the user agent string to match one of a set list they have, they can reject requests otherwise. That's bad though.
Posted: Wed Aug 09, 2006 11:35 am
by Luke
what would be your reasoning?
Posted: Wed Aug 09, 2006 12:46 pm
by christian_phpbeginner
Hi everybody...
I was just asking because I found some website which can't be retrieved with file_get_contents(), what I was going to ask actually...how do you 'file_get_contents()' website like that ?
For example this website below can't be retrieved:
http://www.goalzz.com/main.aspx?region= ... pdate=true
I wonder why ?
Thanks !
Posted: Wed Aug 09, 2006 1:17 pm
by feyd
They've chosen to basically be jerks and filter what user-agent's they will allow. Using cURL, I can easily get the page. The following works as well.
Code: Select all
[feyd@home]>php -r "ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6'); var_dump(get_headers('http://www.goalzz.com/main.aspx?region=-1&area=6&update=true'));"
array(11) {
[0]=>
string(15) "HTTP/1.1 200 OK"
[1]=>
string(17) "Connection: close"
[2]=>
string(35) "Date: Wed, 09 Aug 2006 18:16:32 GMT"
[3]=>
string(25) "Server: Microsoft-IIS/6.0"
[4]=>
string(21) "X-Powered-By: ASP.NET"
[5]=>
string(26) "X-AspNet-Version: 1.1.4322"
[6]=>
string(62) "Set-Cookie: ASP.NET_SessionId=cpbavi551sri2w453kvvrx45; path=/"
[7]=>
string(22) "Cache-Control: private"
[8]=>
string(38) "Expires: Tue, 09 Aug 2005 18:16:32 GMT"
[9]=>
string(45) "Content-Type: text/html; charset=Windows-1252"
[10]=>
string(21) "Content-Length: 78043"
}
Posted: Wed Aug 09, 2006 2:17 pm
by christian_phpbeginner
feyd wrote:They've chosen to basically be jerks and filter what user-agent's they will allow. Using cURL, I can easily get the page.
Hi feyd....
Thanks for the great info, I'm downloading the libcurl now, later I'll read the docs to install it.
Chris
Posted: Wed Aug 09, 2006 2:18 pm
by Luke
is that where you live... in ur avatar? (Sorry... off-topic, but that's cool if it is)
Posted: Wed Aug 09, 2006 2:29 pm
by christian_phpbeginner
The Ninja Space Goat wrote:is that where you live... in ur avatar? (Sorry... off-topic, but that's cool if it is)
Hi Ninja,
Thank you, but I don't live there. That's my cabin, it's not a fancy and expensive one actually, because we built it by our own hands. I am missing it now, because I am not in my cabin. Can't wait for the next summer...
Posted: Wed Aug 09, 2006 2:31 pm
by Luke
that's awesome
Re: How to prevent my website being retrieved by someone else ?
Posted: Fri Mar 28, 2008 6:40 am
by idy
Opening up an old thread.
I tried curl and the ini_set for generic user agents, but I cannot seem to be able to retrieve the contents of the following url - any clues appreciated !!
http://www.petitscailloux.com/Follow.as ... detail.htm
Thanks a lot !
Re: How to prevent my website being retrieved by someone else ?
Posted: Fri Mar 28, 2008 1:49 pm
by Barzouk
No problem for me. all is fine
Re: How to prevent my website being retrieved by someone else ?
Posted: Fri Mar 28, 2008 2:22 pm
by idy
Barzouk - would you mind posting the code you used ?
Thanks a lot !
Re: How to prevent my website being retrieved by someone else ?
Posted: Fri Mar 28, 2008 2:25 pm
by Barzouk
idy wrote:Barzouk - would you mind posting the code you used ?
Thanks a lot !
Try it again and see what message you get
Re: How to prevent my website being retrieved by someone else ?
Posted: Fri Mar 28, 2008 2:52 pm
by idy
OK I tried :
Code: Select all
$url = "http://www.petitscailloux.com/Follow.aspx?sUrl=http://www.seloger.com/199986/16271207/detail.htm";
ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5.0.6');
echo file_get_contents($url);
and the result was :
I then tried :
Code: Select all
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.petitscailloux.com/Follow.aspx?sUrl=http://www.seloger.com/199986/16271207/detail.htm');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
echo $file_contents = curl_exec($ch);
curl_close($ch);
which got me the following :
Runtime Error
Description: An application error occurred on the server. The current custom error settings for this application prevent the details of the application error from being viewed remotely (for security reasons). It could, however, be viewed by browsers running on the local server machine.
Details: To enable the details of this specific error message to be viewable on remote machines, please create a <customErrors> tag within a "web.config" configuration file located in the root directory of the current web application. This <customErrors> tag should then have its "mode" attribute set to "Off".
<!-- Web.Config Configuration File -->
<configuration>
<system.web>
<customErrors mode="Off"/>
</system.web>
</configuration>
Notes: The current error page you are seeing can be replaced by a custom error page by modifying the "defaultRedirect" attribute of the application's <customErrors> configuration tag to point to a custom error page URL.
<!-- Web.Config Configuration File -->
<configuration>
<system.web>
<customErrors mode="RemoteOnly" defaultRedirect="mycustompage.htm"/>
</system.web>
</configuration>