Calculating percent match

XML, Perl, Python, and other languages can be discussed here, even if it isn't PHP (We might forgive you).

Moderator: General Moderators

Post Reply
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Calculating percent match

Post by GeXus »

Okay, I have a theoretical question which I can't seem to figure out... maybe a math wiz might be able to help :)

Let's pretend you asked 100 questions to 100 people. The answers to the questions are either 'A lot' (value: 2), 'A Little' (value: 1), or 'Don't Care' (value: 0). Based on those values, you would then want to show which people have similar personalties (assuming that these questions are geared towards personality -- this would be based on a percent match)... Any idea how I would do that?
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Post by Oren »

Well, I think you need to use OOP (can be done without OOP but I'll explain the OOP way).
So what you need, I think, is to have some "person" object and you create a new instance of it for each one of the 100 persons. Now you need to pass to this object their answers to the questions and the object will create some profile based on the answers. You will need another object that will guide the first one. The first object I mentioned is really just an object which holds the person's ID, and a list of characters (kind, neat, smart) and possibly some other data you will want to hold - that's up to you. Now the second object + the answers you supplied to the first one will be used to initialize these characters in the "person" object.
Finally, you will need another object that takes all the "person" objects and tells you how they are related based on some rules that you will decide.

I know it's not a formula but just an abstract idea, but this is the power of OOP and programming - you solve a difficult problem by breaking it into smaller sub-problems.

I hope this helps somehow and gives you a general direction :P

P.S Very interesting topic by the way :wink:
User avatar
RobertGonzalez
Site Administrator
Posts: 14293
Joined: Tue Sep 09, 2003 6:04 pm
Location: Fremont, CA, USA

Post by RobertGonzalez »

What is the rest of the criteria? I think I have an idea of what you are doing, but there has to be more.
User avatar
Oren
DevNet Resident
Posts: 1640
Joined: Fri Apr 07, 2006 5:13 am
Location: Israel

Post by Oren »

Everah wrote:What is the rest of the criteria? I think I have an idea of what you are doing, but there has to be more.
Yeah I agree, that's what I thought just after I had read the original post.
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

That's pretty much it I think... there could be 1000 people, where one answers 5 questions, another answers 20.. etc. So in the end you could say, persons a,b, and c are close to person X, the weight of each question is equal, but matches of a higher valued answer would put more weight on that person towards that question. Each question would have a question ID.. so if 5 people said A Little (value: 1) to question ID 555, those people would be a match, then as more questions are answered, they may not be a match anymore, based on the overall percent.... hope that helps.

My question is not so much how would this be done programmatically, but in theory, how would you set this up.. what math would you use.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

I think you need to define a weighting for each personality for each question. So say you had three personalities, say Maddog, Sheepish and Foxy. For question #1 you might say that Maddog is a +1 weighting, Sheepish has a -1 weighting, and Foxy had a 0 weighting. If the user answers 2 for quesiton #1 then the scores are Maddog=2, Sheepish=-2, Foxy=0. Just keep adding up the scores.

You can of course change the algorithm and values to make the weightings work specifically how you need them to work.
(#10850)
bubblenut
Forum Newbie
Posts: 20
Joined: Sat Feb 03, 2007 4:16 am
Location: London

Post by bubblenut »

There are lots of different similarity algorithms out there and it's an incredibly interesting area. I have gone with a very simple distance aproach but as Everah mentioned, without more information on your criteria I can't know if this is what you're looking for.

I've gone with a DB example to make things a little simpler. The first function simply pulls the info we need from the database either for a particular user or for everyone. The second function finds close matches. The idea behind it is to determine the score distance of each other user from the user we're interested in then take the top n of these and calculate the percentage.

Code: Select all

<?php

function extractAnswers( $user=null )
{
    global $con;
    $sql = "SELECT user, question, answer FROM user_answers";
    if( $user ) {
        $sql .= " WHERE user=" . (int)$user;
    }
    $res = mysql_query( $sql, $con );
    $return = array();
    while( $row = mysql_fetch_assoc( $res ) ) {
        $return[ $row['user'] ][ $row['question'] ] = $row['answer'];
    }
    if( $user ) {
        if( !isset( $return[ $user ] ) ) {
            return false;
        }
        return $return[ $user ];
    } else {
        return $return;
    }
}

function findMatchesFor( $user, $number_of_matches=5 )
{
    if(!$user_answers = extractAnswers( $user )) {
        return false;
    }
    if( count( $all_answers = extractAnswers() ) < 2 ) {
        return false;
    }
    $user_distances = array();
    foreach( $all_answers as $other_user => $other_user_answers ) {
        if( $other_user == $user ) continue;
        $user_distances[ $other_user ] = 0;
        foreach( $user_answers as $question => $answer ) {
            $user_distances[ $other_user ] += abs( $answer - $other_user_answers[ $question ] );
        }
    }

    sort( $user_distances );
    $user_distances = array_slice( $user_distances, 0, $number_of_matches );

    $max_score = count( $user_answers ) * 2;
    $user_percentages = array();
    foreach( $user_distances as $user => $distance ) {
        $user_percentages[ $user ] = ( ( $max_score - $distance ) / $max_score ) * 100;
    }
    return $user_percentages;
}
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

Thanks for all the input... I looked up some of the similarity algorithms.. and let me just say that they are WAAAY beyond me. But this is what i've come down to.

Each question will be placed inside a category.

A query will get the sum of all values to questions for a user from a paticular category.

From that sum I will get the standard deviation.

We now have a range in which we could concider others in this same range compatible for that paticular category.

I'll then repeat this process accross all categories and combine the results into one percent. Then based on the overal deviation, people who fall within that range would be concidered compatible.

What do you think? I'm still not 100% on it...
Post Reply