use iptables to prevent webscraping?

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
DetroitDan
Forum Newbie
Posts: 1
Joined: Thu Jun 18, 2015 11:15 am

use iptables to prevent webscraping?

Post by DetroitDan »

Hello,
I am just starting out with a php-based site which will have a (hopefully!) have a large database of professionals. The site is hosted on ipage. I want to make sure that no one would be able to web scrape the info in my database (by accessing all indexed php pages, index.php?id=1,2,3,4 etc.). I do want search engines (e.g. Google) to be able to access/index my site. Does anyone have any good suggestions as to how to accomplish this? I see that iptables can limit connections to n/minute, but I don't see how to use this for a site hosted on ipage, nor how to allow the big search engines to get through without problem.

Thanks for any help,
Dan
User avatar
Celauran
Moderator
Posts: 6427
Joined: Tue Nov 09, 2010 2:39 pm
Location: Montreal, Canada

Re: use iptables to prevent webscraping?

Post by Celauran »

Don't use auto-incrementing IDs. UUID is a much better option for a number of reasons, including making available IDs less obvious. Even if iptables were available to you, you'd need to rate limit users but not rate limit Google and other search bots. Not an easy proposition.
Post Reply