how search engines work?

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
alxkn
Forum Newbie
Posts: 14
Joined: Sat May 20, 2006 5:14 pm

how search engines work?

Post by alxkn »

I wonder where search engines get list of urls to crawl? From a local database or they have ability to crawl all web by themselves?"Is source code of a search engine available for public?

Thanks.
A.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Both. Generally, no. The concepts behind them are, often, from scientific papers.
alxkn
Forum Newbie
Posts: 14
Joined: Sat May 20, 2006 5:14 pm

Post by alxkn »

Scientific papers are not a problem for me. Do you know what kind of papers and what kind journals they are published?

Thanks in advance.
A.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

Google is a particularly fascinating system, they actually model their crawling algorithm on natural phenomena. They use a system not unlike "swarm" or "flock intelligence" to crawl and rank web pages. They did a write-up on it here: http://www.google.com/technology/pigeonrank.html

If you search around (i.e. "Google it"), Google have published lots of papers and "tech talks" abut their technologies.
deadoralive
Forum Commoner
Posts: 28
Joined: Tue Nov 06, 2007 1:24 pm

Post by deadoralive »

Kieran Huggins wrote:Google is a particularly fascinating system, they actually model their crawling algorithm on natural phenomena. They use a system not unlike "swarm" or "flock intelligence" to crawl and rank web pages. They did a write-up on it here: http://www.google.com/technology/pigeonrank.html

If you search around (i.e. "Google it"), Google have published lots of papers and "tech talks" abut their technologies.
Ha ha ha :-) Gotta love april fools
alxkn
Forum Newbie
Posts: 14
Joined: Sat May 20, 2006 5:14 pm

Post by alxkn »

Never read such a stupid paper. :lol:
User avatar
JellyFish
DevNet Resident
Posts: 1361
Joined: Tue Feb 14, 2006 7:18 pm
Location: San Diego, CA

Post by JellyFish »

Oh yeah, I have my pigeons code for me all the time... :-"
User avatar
Jonah Bron
DevNet Master
Posts: 2764
Joined: Thu Mar 15, 2007 6:28 pm
Location: Redding, California

Post by Jonah Bron »

Mine makes a great latte. :wink:
bubblenut
Forum Newbie
Posts: 20
Joined: Sat Feb 03, 2007 4:16 am
Location: London

Post by bubblenut »

Check out this wikipedia page for some crawler examples. http://en.wikipedia.org/wiki/Category:Free_web_crawlers

If you're comfortable with Java then Nutch has quite a well developed, page-rank orientated crawler implementation. It's quite confusing to follow the code though as it uses Hadoop, Apache's implementation of Googles map-reduce distribution method. It makes you head go 8O
Post Reply