Page 1 of 1

Best way to setup URL Blacklist?

Posted: Thu Sep 23, 2010 9:14 pm
by conquesimo
Ahoy,

I'm creating a simple site that allows people to submit links which then get published for everyone else to visit. To try and preempt links leading to malware and the like, I want to implement a blacklist URL filter. I downloaded the lists from URLblacklist.com and was going to just insert each URL into a giant mysql table, but it seems the table would be obscenely large.

Is there a fast way to compare a URL to all the URLs listed in a file? Or would I be better off to just create a giant table?

ConQuesimo

Re: Best way to setup URL Blacklist?

Posted: Thu Sep 23, 2010 10:20 pm
by requinix
Go on: put them all into a table. Are they hostnames or actual URLs? If the latter, why not just keep track of domain names and IP addresses of "bad" sites?

Re: Best way to setup URL Blacklist?

Posted: Fri Sep 24, 2010 1:23 pm
by conquesimo
Somehow it just seemed wrong to have roughly two million strings in a single-column table. I went and did it anyway. Matching a domain against the table only takes 0.0003 sec. So, I guess it works just fine.

Re: Best way to setup URL Blacklist?

Posted: Fri Sep 24, 2010 1:35 pm
by VladSun
Well, you may wish to do it yourself - using an "expert" techniques like hashing or dictionary tree (or just trie). An DB solution gives you a one with a very good performance (as you did notice). Also I've downloaded the black lists and there is a categorization made for each domain. It does makes sense to use a DB if you wish to block only some of the categories.