Source code for Page rank algorithm
Moderator: General Moderators
Source code for Page rank algorithm
Hi friends ,
I am working on Page rank algorithm from last 1 month .
I have to implement the page rank algorithm and have to develop
the code for the algorithm . Basically i have the Num of INBOUND and the OUT BOUND LINKS for individual pages of a website . I have the list of pages which links to other pages of the website , i can say i have developed a crawler for that and now i am stuck up with the real implementation for the algorithm .
Please geeks , need your help .
Thanks ,
Ved.
I am working on Page rank algorithm from last 1 month .
I have to implement the page rank algorithm and have to develop
the code for the algorithm . Basically i have the Num of INBOUND and the OUT BOUND LINKS for individual pages of a website . I have the list of pages which links to other pages of the website , i can say i have developed a crawler for that and now i am stuck up with the real implementation for the algorithm .
Please geeks , need your help .
Thanks ,
Ved.
Re: Source code for Page rank algorithm
There is no "the page rank algorithm". You come up with it by yourself.
You'll need more than just the number of inbound and outbound links to calculate a good ranking.
You'll need more than just the number of inbound and outbound links to calculate a good ranking.
Re: Source code for Page rank algorithm
From what i understand, most ranking sites (the ones im thinking of) use nothing more than inbound and outbound to determine ranking.tasairis wrote:You'll need more than just the number of inbound and outbound links to calculate a good ranking.
That said of course, you could implement some sort of an algorithm taking into account many variables, but i can see the point of doing that when your looking for the most active (thats my assumption).
The question really is, how are you storing your inbound and outbound traffic numbers?
If they were stored in a database, you could simply select all your entries then order by your inbound field descending, which would give you a list of entries ordered by the highest inbound first (which i would attribute a good ranking).
Re: Source code for Page rank algorithm
So you think spammer's link farms should be ranked top?Weiry wrote:If they were stored in a database, you could simply select all your entries then order by your inbound field descending, which would give you a list of entries ordered by the highest inbound first (which i would attribute a good ranking).
You need to consider the quality of the inbound links. That's the hard bit.
Re: Source code for Page rank algorithm
Well i suppose in that case, you could also record the IP addresses relating to each inbound/outbound and only select distinct results. Yes that may still be prone to spammers, but it should eliminate a majority.onion2k wrote:You need to consider the quality of the inbound links. That's the hard bit.
But other than recording IP's, would there actually be a way to stop the spammers, assuming they were to use dynamic IP's?
- iankent
- Forum Contributor
- Posts: 333
- Joined: Mon Nov 16, 2009 4:23 pm
- Location: Wales, United Kingdom
Re: Source code for Page rank algorithm
there's a reason Google were so successfulWeiry wrote: Well i suppose in that case, you could also record the IP addresses relating to each inbound/outbound and only select distinct results. Yes that may still be prone to spammers, but it should eliminate a majority.
Basically, anything that can be checked, should be checked. And ratings shouldn't be directly stored in the database, but rather 'points' should be awared and different points would have different weights. This allows for adjusting the algorithm without dealing with massive database updates.
Unfortunately you won't be able to do most of that using MySQL and would probably struggle to do it in PHP, something like C would be a more suitable language.
Re: Source code for Page rank algorithm
So you think proper sites on shared hosts should be marked down?Weiry wrote:Well i suppose in that case, you could also record the IP addresses relating to each inbound/outbound and only select distinct results. Yes that may still be prone to spammers, but it should eliminate a majority.onion2k wrote:You need to consider the quality of the inbound links. That's the hard bit.
Ranking of unorganised linked data (like websites) is really, really hard. Stop trying to over-simplify the problem. Any "obvious" solution will be so badly flawed it'll be unusable. Google have had some very, very clever people working on the problem for a decade and they still haven't really solved it.
Re: Source code for Page rank algorithm
Im simply trying to provide a direction or give some suggestions, no one else seems to be saying anything other than "its really hard".onion2k wrote:Stop trying to over-simplify the problem.
The only thing i am trying to do is provide some sort of a direction, while as over-simplified as they may be. I was grateful when iankent replied because that gave a better understanding of what is really required behind the scenes, which i think is more important then simply saying that "its really hard".
- iankent
- Forum Contributor
- Posts: 333
- Joined: Mon Nov 16, 2009 4:23 pm
- Location: Wales, United Kingdom
Re: Source code for Page rank algorithm
Have a look here for some more detailed info:
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are
Re: Source code for Page rank algorithm
Also note that that was around 1997, before spammers and "SEO experts" started to work out ways to cheat.iankent wrote:Have a look here for some more detailed info:
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are
- iankent
- Forum Contributor
- Posts: 333
- Joined: Mon Nov 16, 2009 4:23 pm
- Location: Wales, United Kingdom
Re: Source code for Page rank algorithm
well yes, Google's algorithms have clearly evolved since that was written 12 years ago, but the principles are the same and it does give a good indication of how pages can be sorted. bear in mind of course that even Google's indexes aren't free from spam, and large numbers of spam sites do appear as top results in some search queries, but that's an on-going challenge thats almost certainly never going to be solved (much like piracy - the bad guys are always at least one step ahead and have been since the mid 70s)onion2k wrote:Also note that that was around 1997, before spammers and "SEO experts" started to work out ways to cheat.iankent wrote:Have a look here for some more detailed info:
http://infolab.stanford.edu/~backrub/google.html
and importantly, note who the authors are