Source code for Page rank algorithm

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
ved2210
Forum Newbie
Posts: 6
Joined: Sun Nov 01, 2009 2:11 pm

Source code for Page rank algorithm

Post by ved2210 »

Hi friends ,

I am working on Page rank algorithm from last 1 month .
I have to implement the page rank algorithm and have to develop
the code for the algorithm . Basically i have the Num of INBOUND and the OUT BOUND LINKS for individual pages of a website . I have the list of pages which links to other pages of the website , i can say i have developed a crawler for that and now i am stuck up with the real implementation for the algorithm .

Please geeks , need your help .

Thanks ,
Ved.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Source code for Page rank algorithm

Post by requinix »

There is no "the page rank algorithm". You come up with it by yourself.

You'll need more than just the number of inbound and outbound links to calculate a good ranking.
User avatar
Weiry
Forum Contributor
Posts: 323
Joined: Wed Sep 09, 2009 5:55 am
Location: Australia

Re: Source code for Page rank algorithm

Post by Weiry »

tasairis wrote:You'll need more than just the number of inbound and outbound links to calculate a good ranking.
From what i understand, most ranking sites (the ones im thinking of) use nothing more than inbound and outbound to determine ranking.
That said of course, you could implement some sort of an algorithm taking into account many variables, but i can see the point of doing that when your looking for the most active (thats my assumption).

The question really is, how are you storing your inbound and outbound traffic numbers?
If they were stored in a database, you could simply select all your entries then order by your inbound field descending, which would give you a list of entries ordered by the highest inbound first (which i would attribute a good ranking).
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Source code for Page rank algorithm

Post by onion2k »

Weiry wrote:If they were stored in a database, you could simply select all your entries then order by your inbound field descending, which would give you a list of entries ordered by the highest inbound first (which i would attribute a good ranking).
So you think spammer's link farms should be ranked top?

You need to consider the quality of the inbound links. That's the hard bit.
User avatar
Weiry
Forum Contributor
Posts: 323
Joined: Wed Sep 09, 2009 5:55 am
Location: Australia

Re: Source code for Page rank algorithm

Post by Weiry »

onion2k wrote:You need to consider the quality of the inbound links. That's the hard bit.
Well i suppose in that case, you could also record the IP addresses relating to each inbound/outbound and only select distinct results. Yes that may still be prone to spammers, but it should eliminate a majority.

But other than recording IP's, would there actually be a way to stop the spammers, assuming they were to use dynamic IP's?
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Source code for Page rank algorithm

Post by iankent »

Weiry wrote: Well i suppose in that case, you could also record the IP addresses relating to each inbound/outbound and only select distinct results. Yes that may still be prone to spammers, but it should eliminate a majority.
there's a reason Google were so successful :P largely down to their rather accurate page ranking system, something that other search engines failed miserably at. there's a lot of variables you need to take into account, for example page size, link to text ratio, inbound to outbound ratio, metadata to content comparisons etc.

Basically, anything that can be checked, should be checked. And ratings shouldn't be directly stored in the database, but rather 'points' should be awared and different points would have different weights. This allows for adjusting the algorithm without dealing with massive database updates.

Unfortunately you won't be able to do most of that using MySQL and would probably struggle to do it in PHP, something like C would be a more suitable language.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Source code for Page rank algorithm

Post by onion2k »

Weiry wrote:
onion2k wrote:You need to consider the quality of the inbound links. That's the hard bit.
Well i suppose in that case, you could also record the IP addresses relating to each inbound/outbound and only select distinct results. Yes that may still be prone to spammers, but it should eliminate a majority.
So you think proper sites on shared hosts should be marked down?

Ranking of unorganised linked data (like websites) is really, really hard. Stop trying to over-simplify the problem. Any "obvious" solution will be so badly flawed it'll be unusable. Google have had some very, very clever people working on the problem for a decade and they still haven't really solved it.
User avatar
Weiry
Forum Contributor
Posts: 323
Joined: Wed Sep 09, 2009 5:55 am
Location: Australia

Re: Source code for Page rank algorithm

Post by Weiry »

onion2k wrote:Stop trying to over-simplify the problem.
Im simply trying to provide a direction or give some suggestions, no one else seems to be saying anything other than "its really hard".
The only thing i am trying to do is provide some sort of a direction, while as over-simplified as they may be. I was grateful when iankent replied because that gave a better understanding of what is really required behind the scenes, which i think is more important then simply saying that "its really hard".
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Source code for Page rank algorithm

Post by iankent »

Have a look here for some more detailed info:
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are :)
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Source code for Page rank algorithm

Post by onion2k »

iankent wrote:Have a look here for some more detailed info:
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are :)
Also note that that was around 1997, before spammers and "SEO experts" started to work out ways to cheat.
User avatar
iankent
Forum Contributor
Posts: 333
Joined: Mon Nov 16, 2009 4:23 pm
Location: Wales, United Kingdom

Re: Source code for Page rank algorithm

Post by iankent »

onion2k wrote:
iankent wrote:Have a look here for some more detailed info:
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are :)
Also note that that was around 1997, before spammers and "SEO experts" started to work out ways to cheat.
well yes, Google's algorithms have clearly evolved since that was written 12 years ago, but the principles are the same and it does give a good indication of how pages can be sorted. bear in mind of course that even Google's indexes aren't free from spam, and large numbers of spam sites do appear as top results in some search queries, but that's an on-going challenge thats almost certainly never going to be solved (much like piracy - the bad guys are always at least one step ahead and have been since the mid 70s)
Post Reply