Page 1 of 1
Help optimizing array comparison function...
Posted: Mon Feb 22, 2010 5:49 pm
by tymlls05
Each user has 2 sets of arrays with with 123 keys in each array. These arrays are stored in a table in MySQL. Each key has a numerical value ranging from 0-5. I am ranking users based on their array vs. others (each key compare to every other users). So for every user I have there are 226 keys challenging each users 226. These keys are not updated often after being set the first time so I could set certain rules like only running the challenge if the challenging user has actually updated their array.
What I am doing works fine on a small scale but I generated 100,00 users and upon generating the rankings it timed out after 30 seconds. I knew this would be a problem but i was antsy to get my algorithm for generating ranking in action. So now here I sit trying to figure out how to better store and compare the data while keeping it as close to real time as possible.
Re: Help optimizing array comparison function...
Posted: Mon Feb 22, 2010 6:13 pm
by JakeJ
You might need to rethink your strategy. Talk to a statistician (maybe find a statistics geek in college) and see if there is a better way to perform the rankings than a direct comparison between each users ranking in each of the keys. I suspect there is but I couldn't even begin to tell you where to start.
Re: Help optimizing array comparison function...
Posted: Mon Feb 22, 2010 8:41 pm
by Benjamin
Assuming the end result of your calculations simply generate an integer:
A vs B = 256
A vs C = 64
B vs C = 128
You can simply create an additional table:
Code: Select all
CREATE TABLE `scores` (
`user_id_a` INT UNSIGNED NOT NULL ,
`user_id_b` INT UNSIGNED NOT NULL ,
`score` INT NOT NULL
) ENGINE = InnoDB;
ALTER TABLE `scores` ADD UNIQUE `user_combination` ( `user_id_a` , `user_id_b` );
The table is InnoDB since it will have a significant number of records and so we desire row level locking. user_id_a and user_id_b are indexed together as type unique which will ensure that there is only one record for each user combination. Depending on the queries you run, you may need to index user_id_a and/or user_id_b separately as well.
The query to insert records could be something like:
Code: Select all
INSERT INTO scores (user_id_a, user_id_b, score) VALUES ('1','2','65') ON DUPLICATE KEY UPDATE score = VALUES(score);
So now to retrieve the scores of one user in relation to all of the others you can execute a simply query, which will be very fast, pretty much instantaneous.
Code: Select all
SELECT user_id_b, score FROM scores WHERE user_id_a = '1' ORDER BY score DESC
You'll need up populate this table by running comparisons each time a user saves their data. This process may take some time, however since you are only comparing a single user against every other user, rather than every user against every other user it should be somewhat quick. If in the end that does not work, you can flag accounts to be scored and score them using a cron job running in the background.
This implementation does have a few issues, ones that you'll need to sort out when creating your system.
Hope that helps.
Re: Help optimizing array comparison function...
Posted: Mon Feb 22, 2010 9:04 pm
by mikosiko
without knowing how the ranking is determined is not easy to suggest some solution.
per example:
User A : 2,1,3,5,5 (just to show some of the elements)
User B : 3,0,5,5,1
in this case, how you determine which one has a higher rank?
either way, if the logic to determine the higher is clear and it is possible to implement some math function over each key to produce a factorized final number probably I'll implement something like this :
- Define a new field in the table...let say "Rank"
- Create table triggers (After Insert/update) to calculate the "Rank" over the record affected.
- The final ranking will be determined automatic without further calculations... just select the records order by "Rank"
just an idea
Miko
Re: Help optimizing array comparison function...
Posted: Wed Feb 24, 2010 10:32 am
by tymlls05
Thank you all for your replies. They have opened my eyes to some techniques I wouldn't have known of otherwise. The ranking is not based on score. There is no score for any one user.
Here is exactly what is happening...
Where keys 1,2,3,4,5,6 are unique id's within the array.
User One:
Array1([1]=>3 [2]=>4 [3]=>5 [4]=>2 [5]=>1 [6]=>5 )
Array2([1]=>2 [2]=>5 [3]=>5 [4]=>2 [5]=>3 [6]=>2 )
User Two:
Array1([1]=>3 [2]=>2 [3]=>1 [4]=>3 [5]=>4 [6]=>5 )
Array2([1]=>3 [2]=>4 [3]=>5 [4]=>2 [5]=>1 [6]=>5 )
The Comparison:
If User1Array1[1]==User2Array1[1] then $count++ to User1 & User2
If User1Array1[1]!=User2Array1[1] BUT if (User1Array2[1] or User2Array2[1]) plus or minus (User1Array1[1] or User2Array1[1]) then $count++ to User1 & User2
If neither are true. Regardless, count every key ($totalkeys) that was compared for the ranking.
So, after every key has been compared we simply find the percentage of User1 and User2 based on matching keys. $count/$totalkeys.
Let's say User1 & User2 came out to 12%. We have that saved, but I still have to find where user1 stands with each user registered. Then move on to user2 compared with everyone else, and so on, and so on, and yeah...
I hope this helps you all help me. I'll be more descriptive in my opening topic from now on.