Checking a URL to see if it's broken

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
Kev
Forum Newbie
Posts: 21
Joined: Tue Aug 25, 2009 9:11 pm

Checking a URL to see if it's broken

Post by Kev »

Hi guys,

I have written a nice little PHP jump script (link redirection script) and would like to build into the backend (admin area) the ability to check links to see if they actually work or are broken (ie. do they return a 404 or 202?).

Does anyone know how to use PHP to check the workingness of a URL? I've done some google searching and can't quite find what I'm looking for. Nothing complicated, just something to show if there are broken links in my database.

Thanks for your help and guidance.
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Checking a URL to see if it's broken

Post by requinix »

You can use get_headers to see the response code from the server.

And tip: don't scan your entire database at once. Scan each URL as it's needed once every (some interval). When your script tries to redirect, scan the URL only if it's been at least so long since the last scan. If it's not good anymore, do something, otherwise update the database (with the current time as the "last scan") and redirect.
User avatar
Kev
Forum Newbie
Posts: 21
Joined: Tue Aug 25, 2009 9:11 pm

Re: Checking a URL to see if it's broken

Post by Kev »

Thanks for the tip tasairis... is there any harm in using get_headers or cURL to check for broken links? In other words, will the other servers think I'm trying to hack, etc and ban me?
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Checking a URL to see if it's broken

Post by requinix »

Assuming get_headers works in the way that makes most sense, all it does is ask the server for some basic information about the page. Doesn't even return the page content.

As long as you don't do this often there shouldn't be any problems.
Post Reply