Page 1 of 1

How to scraping?

Posted: Tue Aug 10, 2010 8:04 am
by infomamun
HI
As for example my site is http://stockpedia.info/price.php. If you open this page in your browser, it scraps and displays a page from a remote server to you. So if hundreds of users open this page at the same second, then my server will send thousand of requests at the same time to the remote server. My question is how the remote server handle the requests or how it detects each request at the same time? Does it use server's IP or user's IP to identify each request? I am telling this because I observed that during trading hour, when hundreds of users visit my page at the same time, it becomes slow and non-responsive. But if there is only 1 or 2 users, that page responses quickly. My question is, is it possible to show the data from the remote server by this type of scraping or I have to use cron jobs for make any single request and save the response to the database. After that when users will visit my page it will load the data from my database. Which one is workable?

Re: How to scraping?

Posted: Tue Aug 10, 2010 8:20 am
by SidewinderX
My question is how the remote server handle the requests or how it detects each request at the same time?
That really depends on what software the remote server is using. Apache and Nginx handle this pretty differently. Generally, each request will open a socket and that socket will remain open until a FIN packet is received or a timeout period expired. Each socket obviously requires server resources. The "slowness" is due to the lack of resources.
My question is, is it possible to show the data from the remote server by this type of scraping or I have to use cron jobs for make any single request and save the response to the database. After that when users will visit my page it will load the data from my database. Which one is workable?
Sure it is possible if you have the resources, but if that is a concern, that it would probably be best to cache the results in a database.

Re: How to scraping?

Posted: Tue Aug 10, 2010 11:17 am
by josh
Yes cache the data, don't scrape their page on every page load. They ought to charge you per hit then you'll figure it out fast.

Re: How to scraping?

Posted: Tue Aug 10, 2010 11:39 am
by infomamun
Thanks sidewinderX and josh for quick replying. Is there any other way to cache data rather than save it to database? Actually what I am scrapping is stock quotes. It updates in each second at the remote server(stock exchange's website). Now if I cache it to the database, it is not possible to update database in each second, it will costs my monthly bandwidth quota. What will be happened, if I set a cron job to update it in each second, it will keep updating on even there will no user to demand it. On the other hand, if I set it to one/two minute interval, users will not get the latest data due to long interval. So is it possible to use something like cache-proxy server? I dont know about proxy server and how to use it. But I know it sends one request to remote server and cache the page and then delivers the result to all clients. Is it possible to run a proxy server from my shared webhost? I dont have access to my php.ini. So if you have any idea to use proxy script, it will my pleasure if you kindly describe it here as far as you can.

Re: How to scraping?

Posted: Tue Aug 10, 2010 12:34 pm
by infomamun
Here I have attached a proxy-server script which was downloaded from web. Please tell me whether it can be used for what I am seeking- for similar type several requests, the proxy server will send only one request to the remote server and returned contents will distribute to all the requester(client's browser).
Regards