How do I stop scrapers, and /?=hello urls?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

How do I stop scrapers, and /?=hello urls?

Post by simonmlewis »

We have a url like this:
http://www.testurl.com/?in=what-do-pills-do

It's not damaging us, but how do I make a URL that doesn't exist, with that ?in=***** redirect to our homepage instead?

In theory, I can set a variable called $in, and redirect it. But what stops them doing ?out-.......

Also, how do I identify within my code if the site is within a scraper, which looks like an iframe - and redirect them?
Last edited by pickle on Fri May 24, 2013 5:09 pm, edited 1 time in total.
Reason: Removed "viagra" keyword that was causing "grilled spam", as well as note referencing the replacement
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
simonmlewis
DevNet Master
Posts: 4435
Joined: Wed Oct 08, 2008 3:39 pm
Location: United Kingdom
Contact:

Re: How do I stop scrapers, and /?=hello urls?

Post by simonmlewis »

Can I query in the URL if there is "/?" ??
Love PHP. Love CSS. Love learning new tricks too.
All the best from the United Kingdom.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: How do I stop scrapers, and /?=hello urls?

Post by requinix »

The only thing that knows whether the URL "exists" is your code. Do whatever it takes to determine if that in value is valid, and if not redirect.
Don't know what "query in the URL" means.

Breaking out of frames is done with Javascript. IIRC something like

Code: Select all

<script type="text/javascript">
if (window.top != window) {
    window.top.location = window.location;
}
</script>
Eric!
DevNet Resident
Posts: 1146
Joined: Sun Jun 14, 2009 3:13 pm

Re: How do I stop scrapers, and /?=hello urls?

Post by Eric! »

You can test if a non-empty query string exists in PHP using

Code: Select all

if(!empty($_GET)) {
   //do something because there is a query string
}
But in general a good bot scraper can mimic a browser and you'll have a difficult time stopping them.
Post Reply