Weighted Random select (with a twist)
Moderator: General Moderators
Weighted Random select (with a twist)
Alright...I'm not even sure if I can explain what I am trying to do in words, but I'll try.
I have two tables involved in this select. Table 1 has a column that sets the weight of each row in it. Then table 2 has the rows that will be selected, each being relationally connected to one of the rows in table 1. Now, if it was one to one relationship, and both tables were small, that wouldnt be very tough to set up a weighted random select by expanding each row into an array and selecting randomly from it.
There are at least two conditions, though, that complicate the situation. First, table 2 can have more than one row associated with one row in table 1. But, the weight cannot change. IE, if a row in table 1 has a weight of 5, and there are two rows associated with it (in table 2) I dont want the weight to be 10...it needs to remain 5. So, each of the two rows in table 2 have a weight of 2.5.
The next problem is that the tables will not be small, so I cannot expand each row into an array every time a selection needs to be made, nor can I use sort by rand(), since that too would take a huge amount of time -- possibly seconds. On a table where selections will be made on the order of a few per second, that is simply not acceptable.
Since this query is going to be executed very frequently, it needs to be FAST.
I have written some code to try to do all of this...but I have changed it and re-changed it so many times to try to get this working, that the code is just a big mess now. I think I need to start over with some fresh ideas. Anything is welcome!
There are a few more complications, but I think I have solved them already, so I wont include them here and make my question even worse.
I have two tables involved in this select. Table 1 has a column that sets the weight of each row in it. Then table 2 has the rows that will be selected, each being relationally connected to one of the rows in table 1. Now, if it was one to one relationship, and both tables were small, that wouldnt be very tough to set up a weighted random select by expanding each row into an array and selecting randomly from it.
There are at least two conditions, though, that complicate the situation. First, table 2 can have more than one row associated with one row in table 1. But, the weight cannot change. IE, if a row in table 1 has a weight of 5, and there are two rows associated with it (in table 2) I dont want the weight to be 10...it needs to remain 5. So, each of the two rows in table 2 have a weight of 2.5.
The next problem is that the tables will not be small, so I cannot expand each row into an array every time a selection needs to be made, nor can I use sort by rand(), since that too would take a huge amount of time -- possibly seconds. On a table where selections will be made on the order of a few per second, that is simply not acceptable.
Since this query is going to be executed very frequently, it needs to be FAST.
I have written some code to try to do all of this...but I have changed it and re-changed it so many times to try to get this working, that the code is just a big mess now. I think I need to start over with some fresh ideas. Anything is welcome!
There are a few more complications, but I think I have solved them already, so I wont include them here and make my question even worse.
I dont think I'm following what you said...
What is the separate table? Are you saying to expand the data once and store all of the values in another table? I cant really do that, because table 2 is always growing, so I would have to regenerate the "separate table" far too often. The weights in table 1 will be changing frequently as well.
What is the separate table? Are you saying to expand the data once and store all of the values in another table? I cant really do that, because table 2 is always growing, so I would have to regenerate the "separate table" far too often. The weights in table 1 will be changing frequently as well.
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
If you wrote the code right a "regeneration" pass wouldn't require much: records to remove and records to add. Literally all that would require is checking how much the weight has changed when that record it being updated.
You're building a cache of the expansion. All the meat should be taken care of when the weight is changed. Nothing about the composition of the weight -- how many records are associated with it -- would affect the expanded weight cache. Because the data associated with the weights are uniform it's a simple matter of randomly selecting amongst them if there are more than one records of information associated with the weighted record.
You're building a cache of the expansion. All the meat should be taken care of when the weight is changed. Nothing about the composition of the weight -- how many records are associated with it -- would affect the expanded weight cache. Because the data associated with the weights are uniform it's a simple matter of randomly selecting amongst them if there are more than one records of information associated with the weighted record.
Ok, so the new table would basically be a table version of the array with the expanded weights. But, if some rows have a very high weight, say 500, the new table could get very very large. And, how would I write 2.5 rows to a table (assuming a weight of 5, with two rows associated with it)?
Then, once the cache table is created, how would I select something from it at random without using sort by rand() which creates way too much load on the server. The cache table may have millions of rows in it (since each row in table 1 would be multiplied by its weight).
Then, once the cache table is created, how would I select something from it at random without using sort by rand() which creates way too much load on the server. The cache table may have millions of rows in it (since each row in table 1 would be multiplied by its weight).
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
The cache has no concern whatsoever for the records associated with the weights. So there is no 2.5. You would simply have five for the main record. The associated records have uniform distribution against that weighted record.
I have not had any performance issues with performing a RAND() against a very simple query. Alternately, you could have store a cache of that pool already shuffled. Then all you have to do is perform a limited select based on an offset kept somewhere. There are race conditions involved in that potentially, but that may not matter.
I have not had any performance issues with performing a RAND() against a very simple query. Alternately, you could have store a cache of that pool already shuffled. Then all you have to do is perform a limited select based on an offset kept somewhere. There are race conditions involved in that potentially, but that may not matter.
Alright, thanks feyd...I think that's going to work. It took a while to think through everything (including the other complications) but I think your idea will be the best way to go about this.
I try to remember to post my final solution once I get to it.
P.S...if everyone spells and pronounces your name wrong, and since its spelled and pronounced "fade", why dont you just spell it f-a-d-e?
I try to remember to post my final solution once I get to it.
P.S...if everyone spells and pronounces your name wrong, and since its spelled and pronounced "fade", why dont you just spell it f-a-d-e?
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
feyd is the name of a character in a series of books you may have heard of: Dune.

http://en.wikipedia.org/wiki/Feyd_Rautha
http://en.wikipedia.org/wiki/Feyd_Rautha
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
- RobertGonzalez
- Site Administrator
- Posts: 14293
- Joined: Tue Sep 09, 2003 6:04 pm
- Location: Fremont, CA, USA