Help! Runaway Crawler!!
Posted: Sun Dec 02, 2007 11:37 am
I've just been developing a simple web crawler today, but it went "off-piste" and started crawling the entire web (well it got as far as Google before I pressed 'stop' in my browser).
The thing is I'm worried that the script's still running even though I did press stop - can anyone clarify this?
The setup is this - the script basically starts at a certain URL, get's the links out of that page then follows them one by one getting more links etc. etc. - each time it finds a new link it print()'s it to the browser - so I sit there watching the links appear as the script runs. The core component of it is a recursive loop so I'm worried that it'll never stop...
The thing is I'm worried that the script's still running even though I did press stop - can anyone clarify this?
The setup is this - the script basically starts at a certain URL, get's the links out of that page then follows them one by one getting more links etc. etc. - each time it finds a new link it print()'s it to the browser - so I sit there watching the links appear as the script runs. The core component of it is a recursive loop so I'm worried that it'll never stop...