Page 1 of 1

use iptables to prevent webscraping?

Posted: Thu Jun 18, 2015 11:24 am
by DetroitDan
Hello,
I am just starting out with a php-based site which will have a (hopefully!) have a large database of professionals. The site is hosted on ipage. I want to make sure that no one would be able to web scrape the info in my database (by accessing all indexed php pages, index.php?id=1,2,3,4 etc.). I do want search engines (e.g. Google) to be able to access/index my site. Does anyone have any good suggestions as to how to accomplish this? I see that iptables can limit connections to n/minute, but I don't see how to use this for a site hosted on ipage, nor how to allow the big search engines to get through without problem.

Thanks for any help,
Dan

Re: use iptables to prevent webscraping?

Posted: Thu Jun 18, 2015 11:32 am
by Celauran
Don't use auto-incrementing IDs. UUID is a much better option for a number of reasons, including making available IDs less obvious. Even if iptables were available to you, you'd need to rate limit users but not rate limit Google and other search bots. Not an easy proposition.