Fast Spiders

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
laviavigdor
Forum Newbie
Posts: 3
Joined: Mon Dec 26, 2005 12:25 pm

Fast Spiders

Post by laviavigdor »

I've tried several methods in PHP to speed up the process of getting some data from a page on the web.
FOPEN, CURL, and sockets.

The best timing I got was 3 sites per second (just for the fopen action of a small html page).

How may I speed this up, any suggestions?

Lavi.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

PHP doesn't natively support threading, so it's a bit difficult.

If you're running PHP5, I believe cURL has the ability to thread requests. Check the documentation.

If you're running Unix on both development and production servers, you can try the PCNTL library for some forking functions.

There really isn't much you can do with PHP though.
laviavigdor
Forum Newbie
Posts: 3
Joined: Mon Dec 26, 2005 12:25 pm

Post by laviavigdor »

Jason,

Would you recommend switching to Ruby / JSP / ASP / ASP.NET ?
[listed in the order of my preference]

Would I get better results in a different server side language?
Would I get better results in a different language?

Lavi.
foobar
Forum Regular
Posts: 613
Joined: Wed Sep 28, 2005 10:08 am

Post by foobar »

Definitely not ASP. ASP is slow, clunky, and not portable in the slightest. JSP/Java is probably your best bet along with ASP.NET. I know the former has "built-in" threading support (nothing in Java is "built-in" per se) which may speed up your code a bit. However, keep in mind that Java has to load thousands of classes on average due to the huge object tree. I'd stick with PHP. If it's lagging, try one of the bytecode cachers such as the Zend Optimizer or mmCache.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

It's a spider: a massive include tree has nothing to do with it, I don't think. Plus, Zend Optimizer isn't even a bytecode cache: it's a code optimizer. And in the end, Zend Optimizer might speed it up, for the wrong reasons.

I, personally, think you should look at Java (not just JSP) and its multithreading capabilities.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

As soon as the source where you're fetching the data from, it doesn't matter much which technology you are using... If you can't speedup the source, try some caching...
User avatar
BDKR
DevNet Resident
Posts: 1207
Joined: Sat Jun 08, 2002 1:24 pm
Location: Florida
Contact:

Post by BDKR »

I think forking is a good option. I would look into it myself.

Otherwise, Python would get the nod from me. There is allready a ton of stuff ready made for all kinds of things. Creating a solution in Python or Ruby would be much faster then doing so in Java.
Post Reply