Weighted Random select (with a twist)

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
paladaxar
Forum Commoner
Posts: 85
Joined: Fri Jun 18, 2004 11:50 pm

Weighted Random select (with a twist)

Post by paladaxar »

Alright...I'm not even sure if I can explain what I am trying to do in words, but I'll try.

I have two tables involved in this select. Table 1 has a column that sets the weight of each row in it. Then table 2 has the rows that will be selected, each being relationally connected to one of the rows in table 1. Now, if it was one to one relationship, and both tables were small, that wouldnt be very tough to set up a weighted random select by expanding each row into an array and selecting randomly from it.

There are at least two conditions, though, that complicate the situation. First, table 2 can have more than one row associated with one row in table 1. But, the weight cannot change. IE, if a row in table 1 has a weight of 5, and there are two rows associated with it (in table 2) I dont want the weight to be 10...it needs to remain 5. So, each of the two rows in table 2 have a weight of 2.5.

The next problem is that the tables will not be small, so I cannot expand each row into an array every time a selection needs to be made, nor can I use sort by rand(), since that too would take a huge amount of time -- possibly seconds. On a table where selections will be made on the order of a few per second, that is simply not acceptable.

Since this query is going to be executed very frequently, it needs to be FAST.

I have written some code to try to do all of this...but I have changed it and re-changed it so many times to try to get this working, that the code is just a big mess now. I think I need to start over with some fresh ideas. Anything is welcome!

There are a few more complications, but I think I have solved them already, so I wont include them here and make my question even worse.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

So what I would likely do is perform the basic weighted random via a separate table that stored the expanded set. Then perform another random if the record was associated with more than one source.
paladaxar
Forum Commoner
Posts: 85
Joined: Fri Jun 18, 2004 11:50 pm

Post by paladaxar »

I dont think I'm following what you said...

What is the separate table? Are you saying to expand the data once and store all of the values in another table? I cant really do that, because table 2 is always growing, so I would have to regenerate the "separate table" far too often. The weights in table 1 will be changing frequently as well.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If you wrote the code right a "regeneration" pass wouldn't require much: records to remove and records to add. Literally all that would require is checking how much the weight has changed when that record it being updated.

You're building a cache of the expansion. All the meat should be taken care of when the weight is changed. Nothing about the composition of the weight -- how many records are associated with it -- would affect the expanded weight cache. Because the data associated with the weights are uniform it's a simple matter of randomly selecting amongst them if there are more than one records of information associated with the weighted record.
paladaxar
Forum Commoner
Posts: 85
Joined: Fri Jun 18, 2004 11:50 pm

Post by paladaxar »

Ok, so the new table would basically be a table version of the array with the expanded weights. But, if some rows have a very high weight, say 500, the new table could get very very large. And, how would I write 2.5 rows to a table (assuming a weight of 5, with two rows associated with it)?

Then, once the cache table is created, how would I select something from it at random without using sort by rand() which creates way too much load on the server. The cache table may have millions of rows in it (since each row in table 1 would be multiplied by its weight).
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

The cache has no concern whatsoever for the records associated with the weights. So there is no 2.5. You would simply have five for the main record. The associated records have uniform distribution against that weighted record.

I have not had any performance issues with performing a RAND() against a very simple query. Alternately, you could have store a cache of that pool already shuffled. Then all you have to do is perform a limited select based on an offset kept somewhere. There are race conditions involved in that potentially, but that may not matter.
paladaxar
Forum Commoner
Posts: 85
Joined: Fri Jun 18, 2004 11:50 pm

Post by paladaxar »

Alright, thanks feyd...I think that's going to work. It took a while to think through everything (including the other complications) but I think your idea will be the best way to go about this.

I try to remember to post my final solution once I get to it.

P.S...if everyone spells and pronounces your name wrong, and since its spelled and pronounced "fade", why dont you just spell it f-a-d-e?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

feyd is the name of a character in a series of books you may have heard of: Dune.

:)

http://en.wikipedia.org/wiki/Feyd_Rautha
paladaxar
Forum Commoner
Posts: 85
Joined: Fri Jun 18, 2004 11:50 pm

Post by paladaxar »

Oh ok. Cool. Never heard of the books, but they sound good. Thanks again for your help. This forum really needs more people like you.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

paladaxar wrote:This forum really needs more people like you.
I think were doing o-k, but a couple more feyd's wouldn't hurt :)
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

Jcart wrote:
paladaxar wrote:This forum really needs more people like you.
I think were doing o-k, but a couple more feyd's wouldn't hurt :)
Of course, if we had a couple more feyd's we could probably do away with all the mainframe computers, internet server farms and about 90% of wikipedia. :wink:
Post Reply