Why connect timedout! When remote server is on heavy load

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
infomamun
Forum Contributor
Posts: 102
Joined: Mon Dec 28, 2009 7:48 pm

Why connect timedout! When remote server is on heavy load

Post by infomamun »

Hi
My site scrapes data from a remote server. All pages of remote server can be scrapped nicely except one page. That page is actually a response page of a ajax page which sends text as output (not in xml output). When I try to scrape that remote page at the time when server load of remote server is relatively low, then there is no problem, but when server loads become high, my server cannot connect to the host only for that page. In case of other pages, there is no problem even in on high load of remote server.

What do you think? Why this is happening? Is it for remote server or due to my server or my php script? I use cURL to scrape all the pages. Is there any other method (like soap, perl) which can connect even at the heavy load time?

Regads
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Why connect timedout! When remote server is on heavy loa

Post by John Cartwright »

When it's under a heavy load, can you navigate to it with your browser?

From what you described, it's not the transport that is the issue, but the inability for the remote server to serve the page properly, and if thats the case, there is not much you can do.
infomamun
Forum Contributor
Posts: 102
Joined: Mon Dec 28, 2009 7:48 pm

Re: Why connect timedout! When remote server is on heavy loa

Post by infomamun »

Thanks John for your reply. Yes I can browse from the browser directly at the time when curl fails and the loading on browser is so fast also.

I thought like you previously that the remote server cant deliver the page at the time when load is high. But that's not the case. There is another website which also scraps the same page and it can scraps efficiently all the time.

You can check the ajax page. It's address is http://www.dsebd.org/mkt_depth_3.php. When you select a company name from the drop-down menu of this page, it calls the response page (which I want to scrape). When I put the response page url with one GET value of the drop-down menu, it also loads quickly but curl or other server side script can't.

I analyze the ajax code of the given page above. What I found, it adds a random sid value (created by javascript math.random formula) with each GET value and then add the string (GET value+sid) with the response page's url before calling the response page. Do you think is there any mechanism with this sid, passing with the GET value from caller page?

But I have tested to call the response page without sid value from browser, and there was no affect of it. It can load same in quicker speed from browser with or without the sid value.

Would you please check the source codes of the given url and ajax calling code for me so that you can reveal something new?
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Why connect timedout! When remote server is on heavy loa

Post by John Cartwright »

Try enablin CURLOPT_VERBOSE and log any errors curl gives back.

Otherwise, you might want to try emulating a browser more throroughly. Like you mentioned, there is a little logic in javascript you might need to implement to look like a legitamate request, settings a common user agent, and/or using a proxy gateway to reach the destination.

However, I won't help you defeat a websites security furthur than this (if thats what it is) with the express written permission of the said website.
Post Reply