Page 1 of 2

Math degree? Lend a hand....

Posted: Thu Jan 15, 2004 9:26 pm
by ilovetoast
Pretty straightforward problem, hoping someone has a solution so I don't have to code it myself.

First off, I'm working with the Bradley-Terry equation. I need to solve the equation in PHP using a Gauss-Seidel method with successive overrelaxtion.

Yes, I know I could use other iterative solutions. But, for this problem I have to use a Gauss-Seidel method with successive overrelaxtion. It is the best solution method for the data involved.

I have the ML code for the answer, but I need PHP. Looking to see if anyone has before I dig into the ML crud.

FWIW the ML code is:

Code: Select all

function їx, error, iter, flag]  = sor(A, x, b, w, max_it, tol)

%  -- Iterative template routine --
% їx, error, iter, flag]  = sor(A, x, b, w, max_it, tol)
%
% sor.m solves the linear system Ax=b using the 
% Successive Over-Relaxation Method (Gauss-Seidel method when omega = 1 ).
%
% input   A        REAL matrix
%         x        REAL initial guess vector
%         b        REAL right hand side vector
%         w        REAL relaxation scalar
%         max_it   INTEGER maximum number of iterations
%         tol      REAL error tolerance
%
% output  x        REAL solution vector
%         error    REAL error norm
%         iter     INTEGER number of iterations performed
%         flag     INTEGER: 0 = solution found to tolerance
%                           1 = no convergence given max_it

  flag = 0;                                   % initialization
  iter = 0;

  bnrm2 = norm( b );
  if  ( bnrm2 == 0.0 ), bnrm2 = 1.0; end

  r = b - A*x;
  error = norm( r ) / bnrm2;
  if ( error < tol ) return, end

  &#1111; M, N, b ] = split( A, b, w, 2 );          % matrix splitting

  for iter = 1:max_it                         % begin iteration

     x_1 = x;
     x   = M \ ( N*x + b );                   % update approximation

     error = norm( x - x_1 ) / norm( x );     % compute error
     if ( error <= tol ), break, end          % check convergence

  end
  b = b / w;                                  % restore rhs

  if ( error > tol ) flag = 1; end;           % no convergence

% END sor.m
Having never used PHP to work with vectors and matrices, I'm hoping someone can push me in the right direction. I checked out PEAR and some other class libraries on the Web, to no success. Not much higher math out there.

Thanks for any help,

peace

Posted: Thu Jan 15, 2004 9:59 pm
by ilovetoast
Found a refernece... no php, but maybe the pseudo-code or details ring a bell for someone here.

http://www.netlib.org/linalg/html_templ ... 0000000000

peace

Posted: Thu Jan 15, 2004 10:51 pm
by McGruff
I dropped out of my maths/physics degree so I'm not best placed to comment.

Anything you can't do with the built-in php array / math fns (see manual) will need some custom fns / classes.

I don't know ML but I might be able to help you with php equivalents if you can explain exactly how each ML fn operates.

For example, in php this bit would be something like:

Code: Select all

<?php
$flag = 0;                                  
$iter = 0;

$b = ??;
$bnrm2 = norm($b);

if  ( $bnrm2 == 0.0 )
{
    $bnrm2 = 1.0;
}
?>
.. except I don't know what norm(b) represents. If you had defined a custom function for norm, and set a $b var, norm($b) would have meaning in php, ie call the function "norm" with arg $b.

Above has created vars (in the global scope) $flag, $iter, and $bnrm2. Dollar vars are assigned values with a single "=".

"==" is used to test "lazy" equivalence - or use "===" to test equivalence and same type.

Posted: Fri Jan 16, 2004 10:16 am
by ilovetoast
McGruff: I'm looking it up to see what those functions are. I'm not a ML programmer either unfortunately.

I found one web site out there that uses php to do exactly this. I'm going to send off an email to the webmaster and see if he'll release the code to the world. He says "its simple" -- yeah right, matrices, vectors, and linear systems of equations are something everyone plays with for fun.

peace

Posted: Sat Jan 17, 2004 11:28 am
by ilovetoast
Those norm calls are to a linear regression formula if I correct. Therein lies the problem....I'm not sure how to code the linear regression into PHP.

Hehe, the irony. (I have to know what I doing in order to figure out how to do it just like the equations I'm solving.) Maybe if I apply the linear regression to myself....Bad math humor.

Anywhoooo... found some FORTRAN code and some LISP code. The LISP stuff leaves me in the same place as the MathLAB stuff, but I think the FORTRAN code hold some promise. A digging I shall go.

I have gotten no email reply from the website that has the code already. Not even a go away. They're in North Dakota, why did I expect any different - surprised they have the Internet there.

peace

Posted: Sat Jan 17, 2004 1:59 pm
by McGruff
Sorry I can't be much help here - good luck.

Posted: Sat Jan 17, 2004 9:57 pm
by ilovetoast
The FORTRAN code is junk unfortunately.

The LISP code however is perfect. What a different type of programming language that is. I've read extensively about it today as background to translate the code I found.

Does anyone know of a good LISP forum by chance? I'd love to have some place to toss a few translating questions.

FWIW, the potential usage of this type of PHP class is widespread. Check out the simple linear regression tutorials over at phpPatterns(). Those deal with one dependent and one independent variable. This deals with one dependent variable and multiple independent variables.

You could use it to do things like analyze medical treatment efficacies, rate financial instruments, analyze ditributed computing setups, work with neural nets, make your own BCS computer, and much more.

peace

Posted: Wed Jan 21, 2004 8:55 am
by BDKR
Do keep us up to speed on this. Maybe consider writing up your findings. As PHP started as a good tool for dealing with the Web development, just like Perl it has the ability to become a pretty good general purpose scripting language. I've been able to use it to work with tree type structures (using arrays) of incredible depth while developing the solution in a fraction of time it would've taken in C.

And even without the ADT extension, I've been able to create stacks and queues with thanx to the abundance of array operations provided by the language natively.

So it would be great if you can tell us all how it comes out. It would most likely be a great benefit to others.

Cheers,
BDKR

Posted: Fri Jan 23, 2004 6:27 pm
by ilovetoast
Drum roll please....

I figured it out!!

Gave up translating strange engineering code (read LISP and Fortran) and decided to do it with pencil and paper. Too make a long story short, I poked around on the net and found some course notes for a Math class on linear equations at a graduate Math program in New Zealand. I then wrote out, using the aid of those notes, a solution to the problem using a sample data set. I have spent the last 3 days turning those 20+ pages of iterative math into less than 50 lines of PHP code.

And here it is, free for all to use however you want. I hope you can make some money from this (as I am), but give a little credit (to ynoth) where credit is due if you use it. Noone can copyright math, no matter what the crazies behind the DMCA have to say, so this is free for all to use as they choose.

I'm working on a class that incorporates this algorithm into something more substantial. But that's a while (days not weeks) away from an initial release. I want to include a Jacobi Method option in addition to the Gauss-Seidel Method. In addition, functions for matrix validation, matrix modification, error reporting, etc. are in various stages of completion.

I have chosen to not use the Matrix class in the Math_Matrix PEAR package as a foundation. It has some serious bugs/problems regarding Matrix multiplication among other things. While I think you could hack it together, I see no reason or obligation to base this working algorithm on another, buggy, Math package.

In addition, I'm putting together a series of notes on practical applications of this code. Hopefully, some or all of those notes will accompany the class itself. For future work, I'm extending the algorithm to solve for certain variations of the Bradley-Terry model, such as those involving multiple (non-pairwise) comparisons, weighting factors, and non-exclusive comparison results (draws/ties/equivocations/etc.).

And so, here is the just the basic algorithm piece in case anyone wants to see it or use it. (I wish I could get my little sigmas in the comment to show up in the forum, but you'll have settle for SUM instead.)

Code: Select all

/**
 * @Author ynoth
 * Remove author credit and suffer pain of karmic justice.
 *
 * Iterative solution of the Bradley-Terry Model using the Gauss-Seidel Method.
 * Bradley-Terry Model:
 *               Ri    =    Wi  /  (SUMj Nij / (Ri + Rj))
 *                    for i = 1, 2... n
 *                    where i != j
 *                    and where n = the number of columns of the pairwise matrix
 *
 * Gauss-Seidel Method applied to the Bradley-Terry Model:
 *                (k+1)                                (k)    (k)
 *               Ri     =    Wi  /  (SUMj Nij  /  (   Ri  +  Rj   ))
 *                    for i = 1, 2... n
 *                    where i != j
 *                    and where n = the number of columns of the pairwise matrix
 *                    and where k = 1, 2... kmax
 *
 * Convergence is obtained if:
 *                                         (k)       (k)
 *               epsilon    >    max |    Ri     -  Rj    |
 *                    for all i,j = 1, 2... n
 *                    where i != j
 *                    and where n = the number of columns of the pairwise matrix
 *
 * Matrix notes:
 *     Must be a matrix containing the results of pairwise comparisons and
 *     therefore must be square (#rows = #cols).
 *     Not all matrices will converge. See Bradley-Terry assumptions for
 *     specifics.
 *
 *     Sample matrix form (this matrix will converge).
 *     array(0 => array( 0, 3, 1),
 *           1 => array(0, 0, 2),
 *           2 => array(1, 1, 0));
 *
 *     Anticipated results for the above sample matrix:
 *     (2.280469, 0.456094, 0.760156)
 */
function solveModel($matrix,
                    $k_max = 1000,
                    $epsilon = .0000001,
                    $Ri_estimate = array())
&#123;
    $matrix_cols = count($matrix);

    // Populate $Ri_estimate if none was passed.
    if (count($Ri_estimate) == 0) &#123;
        $Ri_estimate = array_fill(0, $matrix_cols, 1);
    &#125;

    // Variables needed for the algorithm.
    $row_sum = array();
    foreach ($matrix as $key=>$val) &#123;
        array_push($row_sum, array_sum($val));
    &#125;
    $Ri = $Ri_estimate;
    $convergence = array_fill(0, $matrix_cols, 0);

    for ($k = 1; $k <= $k_max; $k++) &#123;
        // Maintain two arrays to allow for convergence checks.
        $Ri_last = $Ri;

        for ($i = 0; $i < $matrix_cols; $i++) &#123;
            $sum = 0;
            for ($j = 0; $j < $matrix_cols; $j++) &#123;
                // i != j
                if ($i == $j) &#123;
                    continue;
                &#125;

                // SUMj Nij / (Ri + Rj)
                $sum += ($matrix&#1111;$i]&#1111;$j] + $matrix&#1111;$j]&#1111;$i]) /
                    ($Ri&#1111;$i] + $Ri&#1111;$j]);
           &#125;

            // Wi / (SUMj Nij / (Ri + Rj))
            $Ri&#1111;$i] = $row_sum&#1111;$i] / $sum;

            // Monitor convergence.
            if (abs($Ri_last&#1111;$i] - $Ri&#1111;$i]) < $epsilon) &#123;
                $convergence&#1111;$i] = 1;
            &#125;
        &#125;

        // Test for convergence after each complete iteration.
        if (array_sum($convergence) == $matrix_cols) &#123;
            break;
        &#125;

        // If there is no convergence after max_iterations, return false.
        if ($k == $k_max) &#123;
            return FALSE;
        &#125;
    &#125;
    return $Ri;
&#125;
peace

eat more toast.

Posted: Fri Jan 23, 2004 7:41 pm
by BDKR
ilovetoast wrote: I have spent the last 3 days turning those 20+ pages of iterative math into less than 50 lines of PHP code.
Right on! Now it's time to go to Disneyland!
ilovetoast wrote: For future work, I'm extending the algorithm to solve for certain variations of the Bradley-Terry model, such as those involving multiple (non-pairwise) comparisons, weighting factors, and non-exclusive comparison results (draws/ties/equivocations/etc.).
I am very interested to see what you kick out here. Some of what you're talking about here sound like areas that have provided difficulties for me in the past due to my lack of a math background.

Good show! Bravo, Bravo...

Cheers,
BDKR

Posted: Fri Jan 23, 2004 10:27 pm
by McGruff
Good work.

If you think this could be useful to the php community it would good to stick it up on a website somewhere.

If you'd like any help whatsoever let me know (with hosting etc for a website - not the maths..8O ).

Posted: Sat Jan 24, 2004 12:10 am
by ilovetoast
As an aside, I have been testing the algorithm on various data sets all evening. Here's some background...all of this would be great to bundle with the eventual class on site somewher. Any thoughts on how or where would be a great help, as I really think it should be connected somehow to the larger PHP community, rather than buried on my laptop.

The Bradley-Terry Model was described in 1952, but probably has been around for decades prior. It has been independently "discovered" several times prior to and since. The earliest formulation of the Model that I am familiar with comes from the work of Zermelo in 1929.

I have not studied the original work, as it is difficult to find ouside of a major academic library and it is in German. Lacking access to the former and suitable knowledge of the latter to enable the proper study of a document of such technical nature, my experience with the Zermelo work is mostly secondhand.

Zermelo's work was, to my understanding, the consequence of an interrupted round-robin chess tournament. Again, I caution that as my knowledge is second-hand, I cannot verify that origin. I can however, verify that he formulated this now much-studied model for pairwise comparisons.

Specifically, the model he postulated and which was subsequently "re-discovered" in 1952 by Bradley and Terry, states that when individuals in a group are compared to one another through repeated paired comparisons, the probability that individual i defeats individual j is equal to Gamma(i) / (Gamma(i) + Gamma(j)), where Gamma(i) and Gamma(j) represent positive-valued parameters associated with individuals i and j in pairwise competitions between individuals i and j.

The concrete example of Zermelo's initial work involved the rating of chess players. Based on the results of pairwise competitions, Gamma(i) was derived for each individual chess player i. In simplest terms, a rating based on the players results and the strength of opposition. The collected ratings of each and every chess player, thereby allowed Zermelo to calculate an estimate of the probability that any chess player i would defeat chess player j.

Specifically, consider a chess player with a rating, Gamma(i), of 200 and a chess player with a rating, Gamma(j), of 100. According to the model the probability that player i would defeat player j in pairwise competition is 200/(200 +100) or 2/3. Expressed as a percentage, this is a 66.67% chance of victory.

Bradley and Terry applied the model to sports teams in general. Others since have applied the model to a multitude of other pairwise comparisons, including drug-trials, medical treatments, neural nets, distributed computing, and various economic analysis. Others have gone further to consider the impact of weighting factors, zero-sum comparisons (no winners, only losers), equivocating comparisons (where ties/draws/or other equivocations are possible), and comparisons multiple rather than pairwise.

---

So, to amuse myself, I decided this evening to calculate ratings for each NFL football team based on their performances this regular season and playoff season. I simplified the comparisons by not incorporating a weighting factor of any kind (such as home-field advatage or quality wins). I further made a semi-standard adjustment to the data matrix to ensure convergence.

Through the iterative solution to the model I obtained a rating of 8.16 for the New England Patriots and a rating of 2.34 for the Carolina Panthers. I don't have the error margin code written, so my ratings are lacking in that aspect.

Based on those ratings, my estimate is that the New England Patriots have a 77.7% chance of victory in the Super Bowl. What use is that to me? Well, the oddsmakers have set the moneylines for the Patriots at -240 and for the Panthers at +200. As I don't want to encourage gambling by anyone but me, I'll leave it to any other gamblers here to draw the relevant conclusions from a comparison of those moneylines to the probability estimate.

It's always entertaing to find new and intriguing ways to use PHP beyond the everyday. I won't deny that one reason I took this project (for a research lab studying drug trial results) was that I knew I could use the code to nuture my gambling as well as solve their statistical problems.

As an irritating aside, had I finished this model earlier, I could have used for my playoff betting. It would have shown a 42% total net return (on 4% scaling wagers) over the three weeks of the playoffs. My 9.5% total net return (on 4% scaling wagers) is nothing to scoff at, but 42%... To bad this past performance is not necessarily indicative of future results.

And I'm spent. peace.

toast is burnt

Posted: Sun Jan 25, 2004 2:09 am
by BDKR
Dude! Nice read.

You've given me some drive to once again try to tackle an issue that arose about a year ago. Weighted Round-Robin. Speaking of new uses for PHP, I was messing around with the idea of a load balancer for a pool of slave databases (we are talking about 'SELECT' queries only). I It was certainly cool and I got a long ways, but I never got a chance to figure this portion of the application out before work pushed me into another area.

However, taking a step in the right direction, I've started to suck up a couple of books that have helped IMMENSELY! 'Fundamenal Algorithms Volume 1' by Donald Knuth as an example. '

Anyway, err..... :idea: , I think I"m going to try something.

Is there an application here?

On another topic, PHP classes (phpclasses.org if I'm not mistaken) may be a good place. PEAR is another option (though I'm not a PEAR fan myself).

Good show!

Cheers,
BDKR

Posted: Sun Jan 25, 2004 6:44 pm
by ilovetoast
Linear systems of equations provide a good means of assisting a load-balancing setup. They're at the core of the design of several high end commercial load-balancers (hardware and software) I've toyed with.

Check out the H. David book, The Method of Paired Comparisons if you need more reading material. It's pretty well the definitive book in the area.

One problem with using PHP is this code is sloooooooooooooooow. It took a good 4-5 seconds to execute the function using the 32x32 data matrix for the NFL season. Some of the data from my client is as big as 300x300. I'm seeing slow times on the samples.

You can speed it by going with tolerances greater than .0000001 obviously. But, time is not a concern for my implementations. This is only being run at certain requested/scheduled times, and only once for each data set.

If you had to repeatedly run it (as in a load-balancing scenario) it may be better to convert it to a shell script at least, or even beyond to improve speed.

peace

Posted: Sun Jan 25, 2004 8:11 pm
by BDKR
ilovetoast wrote: Linear systems of equations provide a good means of assisting a load-balancing setup. They're at the core of the design of several high end commercial load-balancers (hardware and software) I've toyed with.

Check out the H. David book, The Method of Paired Comparisons if you need more reading material. It's pretty well the definitive book in the area.
Great! Thanx for the feedback.
ilovetoast wrote: One problem with using PHP is this code is sloooooooooooooooow. It took a good 4-5 seconds to execute the function using the 32x32 data matrix for the NFL season. Some of the data from my client is as big as 300x300. I'm seeing slow times on the samples.

You can speed it by going with tolerances greater than .0000001 obviously. But, time is not a concern for my implementations. This is only being run at certain requested/scheduled times, and only once for each data set.

If you had to repeatedly run it (as in a load-balancing scenario) it may be better to convert it to a shell script at least, or even beyond to improve speed.
Yeah, I know it's slow. I was writing it in PHP at the time as a proof of concept thing that I would then turn around and write in C. It was the kind of situation where there were those that just didn't believe, if you get my drift. It ultimately didn't seem that bad, but I'm positive it would've reached a ceiling long before a solution written in C.

BTW, it was written as a CLI type app that would read from a conf file in /etc before starting.

Ultimately, I believe the newest MySQL API (MySQLi) is capable of handling this type of scenario for you, but don't quote me on that.

Utlimately, I may still write this one day. It doesn't seem that there are any good open source solutions out there for this kind of thing. Piranha may be capable of working in this type of scenario, but I'm not sure.

Later on,
BDKR

PS Do you like toast with orange juice or coffee?