Best way to setup URL Blacklist?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
conquesimo
Forum Newbie
Posts: 10
Joined: Mon May 26, 2008 3:30 pm

Best way to setup URL Blacklist?

Post by conquesimo »

Ahoy,

I'm creating a simple site that allows people to submit links which then get published for everyone else to visit. To try and preempt links leading to malware and the like, I want to implement a blacklist URL filter. I downloaded the lists from URLblacklist.com and was going to just insert each URL into a giant mysql table, but it seems the table would be obscenely large.

Is there a fast way to compare a URL to all the URLs listed in a file? Or would I be better off to just create a giant table?

ConQuesimo
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Best way to setup URL Blacklist?

Post by requinix »

Go on: put them all into a table. Are they hostnames or actual URLs? If the latter, why not just keep track of domain names and IP addresses of "bad" sites?
conquesimo
Forum Newbie
Posts: 10
Joined: Mon May 26, 2008 3:30 pm

Re: Best way to setup URL Blacklist?

Post by conquesimo »

Somehow it just seemed wrong to have roughly two million strings in a single-column table. I went and did it anyway. Matching a domain against the table only takes 0.0003 sec. So, I guess it works just fine.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Best way to setup URL Blacklist?

Post by VladSun »

Well, you may wish to do it yourself - using an "expert" techniques like hashing or dictionary tree (or just trie). An DB solution gives you a one with a very good performance (as you did notice). Also I've downloaded the black lists and there is a categorization made for each domain. It does makes sense to use a DB if you wish to block only some of the categories.
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply