Ahoy,
I'm creating a simple site that allows people to submit links which then get published for everyone else to visit. To try and preempt links leading to malware and the like, I want to implement a blacklist URL filter. I downloaded the lists from URLblacklist.com and was going to just insert each URL into a giant mysql table, but it seems the table would be obscenely large.
Is there a fast way to compare a URL to all the URLs listed in a file? Or would I be better off to just create a giant table?
ConQuesimo
Best way to setup URL Blacklist?
Moderator: General Moderators
-
conquesimo
- Forum Newbie
- Posts: 10
- Joined: Mon May 26, 2008 3:30 pm
Re: Best way to setup URL Blacklist?
Go on: put them all into a table. Are they hostnames or actual URLs? If the latter, why not just keep track of domain names and IP addresses of "bad" sites?
-
conquesimo
- Forum Newbie
- Posts: 10
- Joined: Mon May 26, 2008 3:30 pm
Re: Best way to setup URL Blacklist?
Somehow it just seemed wrong to have roughly two million strings in a single-column table. I went and did it anyway. Matching a domain against the table only takes 0.0003 sec. So, I guess it works just fine.
Re: Best way to setup URL Blacklist?
Well, you may wish to do it yourself - using an "expert" techniques like hashing or dictionary tree (or just trie). An DB solution gives you a one with a very good performance (as you did notice). Also I've downloaded the black lists and there is a categorization made for each domain. It does makes sense to use a DB if you wish to block only some of the categories.
There are 10 types of people in this world, those who understand binary and those who don't