Weighted Random select (with a twist)
Posted: Fri Dec 29, 2006 9:55 am
Alright...I'm not even sure if I can explain what I am trying to do in words, but I'll try.
I have two tables involved in this select. Table 1 has a column that sets the weight of each row in it. Then table 2 has the rows that will be selected, each being relationally connected to one of the rows in table 1. Now, if it was one to one relationship, and both tables were small, that wouldnt be very tough to set up a weighted random select by expanding each row into an array and selecting randomly from it.
There are at least two conditions, though, that complicate the situation. First, table 2 can have more than one row associated with one row in table 1. But, the weight cannot change. IE, if a row in table 1 has a weight of 5, and there are two rows associated with it (in table 2) I dont want the weight to be 10...it needs to remain 5. So, each of the two rows in table 2 have a weight of 2.5.
The next problem is that the tables will not be small, so I cannot expand each row into an array every time a selection needs to be made, nor can I use sort by rand(), since that too would take a huge amount of time -- possibly seconds. On a table where selections will be made on the order of a few per second, that is simply not acceptable.
Since this query is going to be executed very frequently, it needs to be FAST.
I have written some code to try to do all of this...but I have changed it and re-changed it so many times to try to get this working, that the code is just a big mess now. I think I need to start over with some fresh ideas. Anything is welcome!
There are a few more complications, but I think I have solved them already, so I wont include them here and make my question even worse.
I have two tables involved in this select. Table 1 has a column that sets the weight of each row in it. Then table 2 has the rows that will be selected, each being relationally connected to one of the rows in table 1. Now, if it was one to one relationship, and both tables were small, that wouldnt be very tough to set up a weighted random select by expanding each row into an array and selecting randomly from it.
There are at least two conditions, though, that complicate the situation. First, table 2 can have more than one row associated with one row in table 1. But, the weight cannot change. IE, if a row in table 1 has a weight of 5, and there are two rows associated with it (in table 2) I dont want the weight to be 10...it needs to remain 5. So, each of the two rows in table 2 have a weight of 2.5.
The next problem is that the tables will not be small, so I cannot expand each row into an array every time a selection needs to be made, nor can I use sort by rand(), since that too would take a huge amount of time -- possibly seconds. On a table where selections will be made on the order of a few per second, that is simply not acceptable.
Since this query is going to be executed very frequently, it needs to be FAST.
I have written some code to try to do all of this...but I have changed it and re-changed it so many times to try to get this working, that the code is just a big mess now. I think I need to start over with some fresh ideas. Anything is welcome!
There are a few more complications, but I think I have solved them already, so I wont include them here and make my question even worse.