Page 1 of 1

Looking for a web crawler

Posted: Wed Jun 03, 2009 10:46 pm
by akreider
I'm looking for a multithreaded webcrawler, or something that is faster than sphider (sphider.eu).

I only need it to visit sites to a specific depth, and collect the text from the pages it visits and store it in a database.

(I don't need any search or indexing functionality.)

A solution that doesn't use php would be fine too.

I've looked at heritrix recently (the crawler used by archive.org) - but it's several times more complex than what I need.

Suggestions?