Page 1 of 1

Fast Spiders

Posted: Mon Dec 26, 2005 1:07 pm
by laviavigdor
I've tried several methods in PHP to speed up the process of getting some data from a page on the web.
FOPEN, CURL, and sockets.

The best timing I got was 3 sites per second (just for the fopen action of a small html page).

How may I speed this up, any suggestions?

Lavi.

Posted: Mon Dec 26, 2005 1:12 pm
by Ambush Commander
PHP doesn't natively support threading, so it's a bit difficult.

If you're running PHP5, I believe cURL has the ability to thread requests. Check the documentation.

If you're running Unix on both development and production servers, you can try the PCNTL library for some forking functions.

There really isn't much you can do with PHP though.

Posted: Mon Dec 26, 2005 1:15 pm
by laviavigdor
Jason,

Would you recommend switching to Ruby / JSP / ASP / ASP.NET ?
[listed in the order of my preference]

Would I get better results in a different server side language?
Would I get better results in a different language?

Lavi.

Posted: Mon Dec 26, 2005 2:43 pm
by foobar
Definitely not ASP. ASP is slow, clunky, and not portable in the slightest. JSP/Java is probably your best bet along with ASP.NET. I know the former has "built-in" threading support (nothing in Java is "built-in" per se) which may speed up your code a bit. However, keep in mind that Java has to load thousands of classes on average due to the huge object tree. I'd stick with PHP. If it's lagging, try one of the bytecode cachers such as the Zend Optimizer or mmCache.

Posted: Mon Dec 26, 2005 7:53 pm
by Ambush Commander
It's a spider: a massive include tree has nothing to do with it, I don't think. Plus, Zend Optimizer isn't even a bytecode cache: it's a code optimizer. And in the end, Zend Optimizer might speed it up, for the wrong reasons.

I, personally, think you should look at Java (not just JSP) and its multithreading capabilities.

Posted: Mon Dec 26, 2005 8:25 pm
by timvw
As soon as the source where you're fetching the data from, it doesn't matter much which technology you are using... If you can't speedup the source, try some caching...

Posted: Mon Dec 26, 2005 11:43 pm
by BDKR
I think forking is a good option. I would look into it myself.

Otherwise, Python would get the nod from me. There is allready a ton of stuff ready made for all kinds of things. Creating a solution in Python or Ruby would be much faster then doing so in Java.