Check remote page using PHP - my method is, well, slow
Posted: Tue Dec 04, 2007 12:49 pm
I am looking to check the page a file links to, and if it contains a <script> tag flag it as bad. Currently I am doing this by sending a request to the remote page, downloading the source, then processing through the source looking for the script tag. This works just fine, some of my pages have 50-70+ links to check, which is a LOT of requests to be sending to the remote server, a LOT of data to be receiving back on the server, and a LOT of processing to be done. It takes around 30 seconds to 1 minute to load a page using this method.
Is there a different way to check the page source remotely looking for a specific item, without retrieving the page first then processing through it? Im guessing not, but I am just curious. My other idea is to cache the page, and only refresh it every 10 minutes or so, thus reducing the load on the server. However that could produce in-accurate information if a script tag (which is used to alert the user on the remote page that a link is dead) pops up before the cache refreshes.
Do you think it should be left as is, or should it be cached for XX minutes, or is there some other way to do this?
Thanks!
-Steve
Is there a different way to check the page source remotely looking for a specific item, without retrieving the page first then processing through it? Im guessing not, but I am just curious. My other idea is to cache the page, and only refresh it every 10 minutes or so, thus reducing the load on the server. However that could produce in-accurate information if a script tag (which is used to alert the user on the remote page that a link is dead) pops up before the cache refreshes.
Do you think it should be left as is, or should it be cached for XX minutes, or is there some other way to do this?
Thanks!
-Steve