Page 1 of 2
Mass blocking suggestions
Posted: Tue Aug 16, 2005 11:26 am
by evilmonkey
Hello everyone,
I'm writing a user-driven website that may reach a peak of 100,000-200,000 users. It's a highly social site, so there will be private messaging. Where there's private messaging, there have to be massive ignore/block lists. I got the individual blocking down: every user has a row in a table called `prefs` (in a mysql db) and the `prefs` table has a "blocked users" colums. The blocked users colum holds comma-seperated id's of the users who are not allowed to send private messages to this person. That works. Now, I want to create a mass filtering system. In the website preferences, a user may opt not to recieve private messages from people under 18 years of age. Or from people of the male gender. Or from people who don't live in the same city. Etc. At first, I though that after a user selects his preference, I would run a mass query, get the id's of all the users that fit the description, and put the id's in the "blocked_users" column. Then I realized that I'm dealing with 100k-200k users, and doing this is unfeasable because it will seriously slow down the database. How else can I design this system in a way that minimizes the amount of queries I have to run? To give a hint of what happens when a user is on the recipient's blocked list, here's the code for sending a private message:
Code: Select all
<?php
//more code above
$to = $_POST["to"]; //comma seperated usernames explode()'d later
$subject = $_POST["subject"];
$message = $_POST["message"];
if(($subject == "") && ($message == "")) {
print("<font color = \"red\">Please enter either a subject or a
message since a blank private message cannot be sent.</font>
<br />");
print($form);
}
else {
$user_names = explode(", ", $to); //who are we sending to? explode the comma-separated usernames from the "TO" feild
for ($i = 0; $i < sizeof($user_names); $i++) {
$userid = sql_pull("SELECT `id` FROM `users` WHERE `username` = '"
. $user_names[$i] . "'", $stats["db"]);
//make sure this user isn't blocked
$get_blocked_users = sql_pull("SELECT `blocked_users` FROM `prefs` WHERE `id`=".$userid[0]['id'], $stats['db']);
$blockedusers = explode(",",$get_blocked_users[0]['blocked_users']);
for ($i2=0; $i2<sizeof($blockedusers); $i2++){
print_r ($stats);
if ($stats['userid']==$blockedusers[$i2]){
$blocked[] = $user_names[$i];
}
else{
$result = $user->send_message($userid[0]["id"], $subject, $message);
}
}
//more code
?>
Thank for the help!

Posted: Tue Aug 16, 2005 11:40 am
by nielsene
Just a few thoughts...
1st: I would probably not use a comma array in a field. I suspect just having a "users_blocked_user" table (userid, blocked_id) would work/scale better. (If your block list gets too long the comma seperated version might mess up. This would also make filtering the list possible completely with a query and not needing to loop seperately in PHP.
2nd. Is all blocking black-list or is their a white list as well? (It changes some of other possibilities)
3rd:How often do you think people will change their mass blocking criteria? How often do you think people would change their profile (ie change the data that matches against the blocking criteria)
4th. What is your full list of possible blocking criteria?
Posted: Tue Aug 16, 2005 11:50 am
by evilmonkey
Hi Eric,
Thanks for your response. I'll address #1 first. I don't really understand what you're suggesting. You're suggesting moving the blocked_users column to a seperate table. Wouldn't it still be comma seperated ID's though? For instance, if user #56 blocked users 85,78,20,54, and 98 the table would still look like this:
Same comma-seperated values. Or did you have something else in mind?
To briefly address #2, I generally want to avoid whitelists, although I may have no choice but impliment one. This is more of a human factor, having whitelists would defeat the purpose of a social website IMO. I'll have to conference with the people I'm working with about that, but for now, I'm assuming there won't be.
#3: Mass blocking will have to change on the fly. Today this group is blocked, tommorrow it isn't. Maybe a person hit the wrong button. Blocking without unblocking is not an option, these will be ever-changing.
#4: People will be able to block individuals, age groups, gender groups, geographic groups, popularity groups, and possibly more. It's pretty extensive.
Posted: Tue Aug 16, 2005 11:50 am
by feyd
I find it better to store a black/white list in a seperate table on it's own. A white list is important so you can automatically greenlight a message from someone that would otherwise be filtered out by their preferences.
A black/white filter table would probably be good as well. Like creating a filter in your email, it can give a simple way of creating a testing guantlet for a message to pass through. (A filter table would also given you a simple way of automatically sorting messages on its own)
Posted: Tue Aug 16, 2005 11:54 am
by evilmonkey
Feyd,
The second part of your response is exactly what I'm asking. How would you suggest designing a filter table and what should happen when a user tries to send a message (e.g. what's the best way to write a checking system).
Thanks for the help.
Posted: Tue Aug 16, 2005 12:02 pm
by feyd
filter type (table) : (individual, age group, whatever)
the filter table would be something like:
for user id
type see above
argument 1 (from experience, most filter sets only need 3 inputs at maximum, the type would dictate what they actually mean and do)
argument 2
argument 3
color black|white should probably be a numeric value so SQL can sort them into the form you will easily deal with
Posted: Tue Aug 16, 2005 12:10 pm
by nielsene
evilmonkey wrote:Hi Eric,
Thanks for your response. I'll address #1 first. I don't really understand what you're suggesting. You're suggesting moving the blocked_users column to a seperate table. Wouldn't it still be comma seperated ID's though? For instance, if user #56 blocked users 85,78,20,54, and 98 the table would still look like this:
Same comma-seperated values. Or did you have something else in mind?
No it would look like:
The primary key would be a multi-column (id,blocked). Not only is it now trivial to do the filtering in the DB, adding/removing blocks won't require reloading the whole list into memory first.
To briefly address #2, I generally want to avoid whitelists, although I may have no choice but impliment one. This is more of a human factor, having whitelists would defeat the purpose of a social website IMO. I'll have to conference with the people I'm working with about that, but for now, I'm assuming there won't be.
I wasn't envisioning a strict white-list, more of the "allow the following people to write me, even if they are catagorized into a category I blocked" style white list.
More to come after I think for a bit.
Posted: Tue Aug 16, 2005 12:16 pm
by evilmonkey
nielsene wrote:evilmonkey wrote:Hi Eric,
Thanks for your response. I'll address #1 first. I don't really understand what you're suggesting. You're suggesting moving the blocked_users column to a seperate table. Wouldn't it still be comma seperated ID's though? For instance, if user #56 blocked users 85,78,20,54, and 98 the table would still look like this:
Same comma-seperated values. Or did you have something else in mind?
No it would look like:
The primary key would be a multi-column (id,blocked). Not only is it now trivial to do the filtering in the DB, adding/removing blocks won't require reloading the whole list into memory first.
Thanks for suggestion, however, I'm considering the implications of this: assuming I have 100,000 users, each user blocks 20 people (in reality it may be much more than that), that's 2,000,000 rows in this table. The rows will probably not be grouped in any way (for instance, user 56 can block user 85 today and user 78 3 months later, and the two rows will be thousands of rows apart by then). Is this really more efficient than finding one row, pulling a feild and exploding it using PHP?
Posted: Tue Aug 16, 2005 12:17 pm
by nielsene
feyd wrote:filter type (table) : (individual, age group, whatever)
the filter table would be something like:
for user id
type see above
argument 1 (from experience, most filter sets only need 3 inputs at maximum, the type would dictate what they actually mean and do)
argument 2
argument 3
color black|white should probably be a numeric value so SQL can sort them into the form you will easily deal with
I think I disagree with most of this. It makes sense, I think to have the following three tables:
1. users_blocked
2. users_whitelisted
3. categories_blocked
I don't think there is a need for a categories_whitelisted, unless you want to start dealing with much greater complexity. I also extremely dislike having a "type" field in a database table, it makes the relation predicate very messy.
While I agree that typically only 3 criteria for any needed for any given restriction, I'm not sure the OP was allowing that style of blocking. Your system would I think be for
"Block everyone who is under 18 and not from my hometown" I think the OP wanted more of a "block everyone who is under 18 and everyone who is not from my hometwon." (Yours is for ANDs, the OPs wanted more ORs I think)
Yours can of course degrade to simple ORs however, but you'll have to allow multiple filters per user (which I think you meant to, but the listed schema doesnt).
Posted: Tue Aug 16, 2005 12:22 pm
by nielsene
evilmonkey wrote:nielsene wrote:evilmonkey wrote:Hi Eric,
Thanks for your response. I'll address #1 first. I don't really understand what you're suggesting. You're suggesting moving the blocked_users column to a seperate table. Wouldn't it still be comma seperated ID's though? For instance, if user #56 blocked users 85,78,20,54, and 98 the table would still look like this:
Same comma-seperated values. Or did you have something else in mind?
No it would look like:
The primary key would be a multi-column (id,blocked). Not only is it now trivial to do the filtering in the DB, adding/removing blocks won't require reloading the whole list into memory first.
Thanks for suggestion, however, I'm considering the implications of this: assuming I have 100,000 users, each user blocks 20 people (in reality it may be much more than that), that's 2,000,000 rows in this table. The rows will probably not be grouped in any way (for instance, user 56 can block user 85 today and user 78 3 months later, and the two rows will be thousands of rows apart by then). Is this really more efficient than finding one row, pulling a feild and exploding it using PHP?
Well its indexed, and id is the first column in the index, so the index will be clustered around id, so it should still be fast and efficient. This also makes it easy to do search for "who blocked a given user", something your admin's might need at times for debugging, etc.
Posted: Tue Aug 16, 2005 12:23 pm
by evilmonkey
Yes, definitly "or". And really doesn't concern me, if a user fits any category, he'll be blocked.
Posted: Tue Aug 16, 2005 12:26 pm
by evilmonkey
Hmm, Eric, you bring up a very good point. I think I'll create that blacklist table.

My other question about filtering still stands though...
Posted: Tue Aug 16, 2005 12:27 pm
by nielsene
nielsene wrote:feyd wrote:filter type (table) : (individual, age group, whatever)
the filter table would be something like:
for user id
type see above
argument 1 (from experience, most filter sets only need 3 inputs at maximum, the type would dictate what they actually mean and do)
argument 2
argument 3
color black|white should probably be a numeric value so SQL can sort them into the form you will easily deal with
I think I disagree with most of this. It makes sense, I think to have the following three tables:
1. users_blocked
2. users_whitelisted
3. categories_blocked
I don't think there is a need for a categories_whitelisted, unless you want to start dealing with much greater complexity. I also extremely dislike having a "type" field in a database table, it makes the relation predicate very messy.
OK have to disagree with myself now too
categories_blocked needs to be seperate tables. (I mistakenly thought feyd's type and color fields were synomous/redundant. I think that the color should definitely go away.)
type doesn't below and should be split out to a collections of tables. --
ages_blocked,
location_blocked,
etc.
from a normalization perspective. Now this starts to make the query to determine whether a person can send to another person much more complicated. Which DBMS are you using? (MySQL I assume?), which version? Does it have Views or Stored Procedures?
The main reason for splitting the categories_blocked to seperate tables is to allow the selections within to be a proper foriegn key back to the listing of things in that category for categories with discrete selections, etc as well as keeping the predicate simple.
Posted: Tue Aug 16, 2005 12:35 pm
by nielsene
Are all messages point-to-point or do you allow broadcast? Ie, i the "To" line always one or more usernames, or is there an option to mail to all people of a given category, etc?
Posted: Tue Aug 16, 2005 12:45 pm
by feyd
it should be quite simple to create a combining system to help filter out things. I've done it in the past for bug systems.. I was going for generics mostly though, to allow for mostly general objects that can be created with a factory based on the type involved. A logical operation can be incorporated, thus creating a list chain that a message creation would need to pass... although if you don't plan on (infrequently) growing the types filterable and/or decide to not support logical congruence among filters, then seperated is perfectly fine. Since my work has to be very scalable, I have to make the single filter table choice most often.