Looking for a web crawler

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
akreider
Forum Commoner
Posts: 46
Joined: Mon May 22, 2006 3:54 pm
Location: USA

Looking for a web crawler

Post by akreider »

I'm looking for a multithreaded webcrawler, or something that is faster than sphider (sphider.eu).

I only need it to visit sites to a specific depth, and collect the text from the pages it visits and store it in a database.

(I don't need any search or indexing functionality.)

A solution that doesn't use php would be fine too.

I've looked at heritrix recently (the crawler used by archive.org) - but it's several times more complex than what I need.

Suggestions?
Post Reply