Page 1 of 1

How do I stop scrapers, and /?=hello urls?

Posted: Thu May 23, 2013 5:34 am
by simonmlewis
We have a url like this:
http://www.testurl.com/?in=what-do-pills-do

It's not damaging us, but how do I make a URL that doesn't exist, with that ?in=***** redirect to our homepage instead?

In theory, I can set a variable called $in, and redirect it. But what stops them doing ?out-.......

Also, how do I identify within my code if the site is within a scraper, which looks like an iframe - and redirect them?

Re: How do I stop scrapers, and /?=hello urls?

Posted: Thu May 23, 2013 5:56 am
by simonmlewis
Can I query in the URL if there is "/?" ??

Re: How do I stop scrapers, and /?=hello urls?

Posted: Thu May 23, 2013 12:28 pm
by requinix
The only thing that knows whether the URL "exists" is your code. Do whatever it takes to determine if that in value is valid, and if not redirect.
Don't know what "query in the URL" means.

Breaking out of frames is done with Javascript. IIRC something like

Code: Select all

<script type="text/javascript">
if (window.top != window) {
    window.top.location = window.location;
}
</script>

Re: How do I stop scrapers, and /?=hello urls?

Posted: Thu May 23, 2013 12:59 pm
by Eric!
You can test if a non-empty query string exists in PHP using

Code: Select all

if(!empty($_GET)) {
   //do something because there is a query string
}
But in general a good bot scraper can mimic a browser and you'll have a difficult time stopping them.