Crawling an entire site
Moderator: General Moderators
Crawling an entire site
I have written a small code to crawl an entire site and extract some data from it. This takes a lot of time to parse. I've uploaded this on my web host. I want this to run just one time.
What I would like to know is if the process take a lot of time, will my site be banned from the web host ? Is this illegal ?
Thanks
What I would like to know is if the process take a lot of time, will my site be banned from the web host ? Is this illegal ?
Thanks
1. So if I do a sleep(1) after every file_get_contents(), the process will not use the processor for the next 1 sec. This way Im safe ? - it'll not hurt other shared sites' processess ?
2. My data transfer limit is 1GB a month - does this get affected by file_get_contents() size ?
3. I have used a lot of RegExp and all are in a loop - will this consume a lot of CPU ?
2. My data transfer limit is 1GB a month - does this get affected by file_get_contents() size ?
3. I have used a lot of RegExp and all are in a loop - will this consume a lot of CPU ?
- Ambush Commander
- DevNet Master
- Posts: 3698
- Joined: Mon Oct 25, 2004 9:29 pm
- Location: New Jersey, US
What Im having is a cheap web host $17 a yr - 10MB space + domain registration - did not find any TOS. Theres no customer support or anything. The web hosting Co is run by one single person. Hes a reseller actually. The server is in US - I think the name is mars because the emails are having that name in the detailed header. Looking at the time at the server its having -5:00 GMT.
I ran a script that extracts some info from a site - it took some 380 secs.
On a different filter : 33 min.
Are these allowed in other web hosts like the ones you host your sites on ?
I want to know what majority do allow and not allow.
I ran a script that extracts some info from a site - it took some 380 secs.
On a different filter : 33 min.
Are these allowed in other web hosts like the ones you host your sites on ?
I want to know what majority do allow and not allow.
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
shared hosts mostly allow just so much average usage of the CPU for the server. This just prevents someone from taking up too much time on the processor thus making all sites on the server to respond slow to non-existant. I believe most set a limit of 5%. If you use too much, they will disable your account until they can contact you typically.
If this needs to run often, you need to upgrade your hosting. Either to dedicated or colocated, as the host won't care how much processor you use then, most often.
If this needs to run often, you need to upgrade your hosting. Either to dedicated or colocated, as the host won't care how much processor you use then, most often.
Will sleep(1) help after regular intervals - in the loop say ? This way my process wont be taking the entire CPU usage all at once.feyd wrote:shared hosts mostly allow just so much average usage of the CPU for the server. This just prevents someone from taking up too much time on the processor thus making all sites on the server to respond slow to non-existant.
O o - I gave set_time_limit (0);feyd wrote:I believe most set a limit of 5%.
In the long loop I tried a echo "|"; after each iteration making it look like a progress bar. But the entire |||||... gets output in one go after the loop is over. I need this progress so that I can know its going somewhere and not infinite loop.
{
...
echo "|";
}
Thanks
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
flush() is not working but flush();ob_flush(); is.
But checking the site http://php.net/manual/en/function.ob-flush.php
it looks like as if ob_flush(); wont parse the rest of the data in buffer ?
In a loop,
{
// Retrieve some details
ob_flush();
echo "|";
}
Is it possible that in the middle of forced output some of the details wont be retrieved ?
But checking the site http://php.net/manual/en/function.ob-flush.php
it looks like as if ob_flush(); wont parse the rest of the data in buffer ?
In a loop,
{
// Retrieve some details
ob_flush();
echo "|";
}
Is it possible that in the middle of forced output some of the details wont be retrieved ?