Page 1 of 1

Call up a .PHP file from .html

Posted: Sat Feb 11, 2006 4:55 am
by boujin
Sami | Please use

Code: Select all

and

Code: Select all

tags where appropriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]


I’m trying to use a user-agent blocker using PHP in yahoo web hosting. Unfortunately Yahoo won’t let me use the following script inside an .html file (unless I change all my web site from .html to .php, which I won't do) to call up my PHP file robots.php (nor do they let me have a .htaccess file either): 

<?PHP include "/robots.php"; ?>

So, I tried using Javascript instead for which I do receive the email warning me that a blocked useragent has accessed my web but e403.html does not show up afterwards.

<script language="JavaScript" type="text/javascript" src="/robots.php">
</script>

So what can i do so that e403.html will show up after i get the email?

This is the content of robots.php (whithout the 2 dashed lines):

------------------------------------------------------

Code: Select all

<?php

     $browser = array ("^crescent",

"wbdbot",
"Web Downloader",
"webauto",
"webbandit",
"WebCapture",
"webcollector",
"WebCopier",
"webdevil",
"WebEMailExtrac.*",
"WebFetch",
"webfetcher",
"WebFountain",
"webhook",
"webminer",
"WebMirror",
"webmole",
"WebReaper",
"WebSauger",
"WebSense",
"website",
"websnake",
"Webster",
"WebStripper",
"websucker",
"webweasel",
"WebWhacker",
"WebZIP",
"Wget",);

     $punish = 0;
     while (list ($key, $val) = each ($browser)) {
          if (strstr ($HTTP_USER_AGENT, $val)) {
               $punish = 1;
          }
     }

     if ($punish) {

          $msg .= "robots.php detected the following banned browser agent errors:\n";
		  $msg .= "Host: $REMOTE_ADDR\n";
          $msg .= "Agent: $HTTP_USER_AGENT\n";
          $msg .= "Referrer: $HTTP_REFERER\n";
          $msg .= "Document: $SERVER_NAME" . $REQUEST_URI . "\n";
        $headers .= "X-Priority: 1\n";
        $headers .= "From: Robots.php <pfs@pfs.net>\n";
        $headers .= "X-Sender: <pfs@pfs.net>\n";
          mail ("pfs@pfs.net", "robots.php BANNED BROWSER
AGENT ERROR from pfs@pfs.net", $msg, $headers

);

include "/e403.html"; 

          exit;
     }

?>
------------------------------------------------------------------


Sami | Please use

Code: Select all

and

Code: Select all

tags where appropriate when posting code. Read:  [url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/color]

Posted: Sat Feb 11, 2006 7:04 am
by onion2k
The problem is with linking to an external php file. When you create a link in an HTML file, eg:

<script language="JavaScript" type="text/javascript" src="/robots.php">
</script>

.. the user's browser requests that file after it's downloaded the HTML. It then receives whatever the output of robots.php is .. but it'll never display it. Html link elements are for things like javascript and css .. not HTML. If you don't have access to .htaccess and your can't put code in your .html files then the only option left is to convert all your files to be .php.

Alterntively you could move to a better hosting company..

Posted: Sat Feb 11, 2006 9:44 am
by nickvd
um... it seems that you are intending to block a bunch of robots from accessing your site... try this: http://www.searchengineworld.com/robots ... torial.htm

you dont need php to do something that is built into the server...

Posted: Sun Feb 12, 2006 3:17 am
by boujin
They are web downloaders, spammers and similars. I don't really mind about the robots since yahoo gives me 400 GB worth each month!. I read your link but the robots.txt protocol will only be followed by "good" robots not "bad robots" so it doesn't apply to downloaders and spammers.

Isn't there any other way of banning these user-agents with some other computer language? maybe .net?

How about other top web hosting companies? does microsoft offer web hosting?

Posted: Sun Feb 12, 2006 3:40 am
by nickvd
If your host doesnt support php, it's doubtful that they would support any server side scripting... The best suggestion would be to switch hosts...

Posted: Sun Feb 12, 2006 3:54 am
by boujin
It does actually support PHP when it is called from .html with <form METHOD="POST" to send emails but when I use the php script mentioned in my first email it won't call up e403.html.

It also supports SSI and works fine.

The problem seems to be placing PHP script inside .html

Posted: Sun Feb 12, 2006 4:28 am
by AKA Panama Jack
boujin wrote:They are web downloaders, spammers and similars. I don't really mind about the robots since yahoo gives me 400 GB worth each month!. I read your link but the robots.txt protocol will only be followed by "good" robots not "bad robots" so it doesn't apply to downloaders and spammers.

Isn't there any other way of banning these user-agents with some other computer language? maybe .net?

How about other top web hosting companies? does microsoft offer web hosting?
Actually it should still block those robots...

Place this inside your main directory and name it robots.txt

Code: Select all

User-agent: ^crescent
Disallow: /

User-agent: wbdbot
Disallow: /

User-agent: Web Downloader
Disallow: /
 
User-agent: webauto
Disallow: /

User-agent: webbandit
Disallow: /

User-agent: WebCapture
Disallow: /

User-agent: webcollector
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: webdevil
Disallow: /

User-agent: WebEMailExtrac.*
Disallow: /

User-agent: WebFetch
Disallow: /

User-agent: webfetcher
Disallow: /

User-agent: WebFountain
Disallow: /

User-agent: webhook
Disallow: /

User-agent: webminer
Disallow: /

User-agent: WebMirror
Disallow: /

User-agent: webmole
Disallow: /

User-agent: WebReaper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebSense
Disallow: /

User-agent: website
Disallow: /

User-agent: websnake
Disallow: /

User-agent: Webster
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: websucker
Disallow: /

User-agent: webweasel
Disallow: /

User-agent: WebWhacker
Disallow: /

User-agent: WebZIP
Disallow: /

User-agent: Wget
Disallow: /
This should keep all of them out of your site.

Posted: Sun Feb 12, 2006 7:39 am
by sheila
boujin is right. Reading and obeying robot.txt is voluntary and "bad robots" simply ignore it.

Posted: Sun Feb 12, 2006 11:25 am
by onion2k
sheila wrote:boujin is right. Reading and obeying robot.txt is voluntary and "bad robots" simply ignore it.
Since when has wget been bad?

Posted: Sun Feb 12, 2006 11:38 am
by Chris Corbyn
onion2k wrote:
sheila wrote:boujin is right. Reading and obeying robot.txt is voluntary and "bad robots" simply ignore it.
Since when has wget been bad?
Hmm yeah. If wget was blocked by a lot of sites I'd be pretty annoyed :lol: I use it a lot.

I can see how it could be annoying to certain web hosts though since it will spider recursively if you tell it to.

Posted: Sun Feb 12, 2006 12:00 pm
by Roja
onion2k wrote:
sheila wrote:boujin is right. Reading and obeying robot.txt is voluntary and "bad robots" simply ignore it.
Since when has wget been bad?
wget by default respects the robot.txt file. You can tell it to do otherwise, but thats the *person* being bad, not the program. :)

Posted: Sun Feb 12, 2006 11:05 pm
by bimo
If you put a(n) .htaccess file with the line

Code: Select all

AddType application/x-httpd-php .php .html
in your home directory, that should work unless yahoo doesn't allow .htacess files.