Alternative to fopen() to see if a remote file exists

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
ConfuzzledDuck
Forum Newbie
Posts: 4
Joined: Thu Aug 25, 2005 4:21 pm
Location: Lancaster, UK

Alternative to fopen() to see if a remote file exists

Post by ConfuzzledDuck »

Hello :)

I'm currently working on a script for a theatre company which has cast lists on their pages about each of their productions. They are connected with an acting agency and want the cast lists to link to each member's profile on the agency website. Each profile is on its own page and named logically so I can work out, given the person's first name and surname, what the address of the page on the agency website should be. This would be simple if all members of the company had profile pages with that agency. Some do not.

Therefore, I need to have a way to test if that member has a profile on the agency's website, hosted elsewhere, and provide a link to their profile if they do and not if not. I know I can use fopen() and test on the return value of that, but I find with some of the larger cast lists its taking a while to generate the pages. Ideally I need a way to just test to see if the page exists on the other server and not actually download the whole thing each time.

I know that file_exists() works with URL wrappers in PHP5, but that's not much help in this situation (the server is running PHP4). I wondered if anyone else had any bright ideas on ways it could be done which might be a little quicker :)

Cheers,
Jonathon
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

cURL can do it quite easily:

Code: Select all

$ch = curl_init('http://www.google.com');
curl_setopt($ch,CURLOPT_NOBODY,1);
curl_setopt($ch,CURLOPT_HEADER,1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
echo curl_exec($ch);
output

Code: Select all

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html
Set-Cookie: :snip:; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com

Server: GWS/2.1
Content-Length: 0
Date: Thu, 25 Aug 2005 21:29:59 GMT
If cURL isn't available, you could use fsockopen() to do similar header requests.
User avatar
ConfuzzledDuck
Forum Newbie
Posts: 4
Joined: Thu Aug 25, 2005 4:21 pm
Location: Lancaster, UK

Post by ConfuzzledDuck »

Thanks for that :)

I'd thought about cURL but it's not available where the site will be running. I'd also just thought of doing it though a socket connection while talking to someone about it on MSN. I'll give that a shot and see if it speeds things up a bit.

Cheers,
Jonathon
User avatar
ConfuzzledDuck
Forum Newbie
Posts: 4
Joined: Thu Aug 25, 2005 4:21 pm
Location: Lancaster, UK

Post by ConfuzzledDuck »

OK, well just to follow up and incase its any help for anyone else and for completion, here's what I've done:

It opens the socket connection to the server and, in a loop, does a HEAD request for the page I want to check and has a look at the first line of the response to see if it's a 200. If it is then it links to it, if not, it doesn't. Then with the same connection it requests the next page I want to check and so on. Once I've done all the checking it closes the connection.

I've yet to see if its considerable quicker than the previous method, I'll test it tomorrow.

Thanks for your help :)
Jonathon
Last edited by ConfuzzledDuck on Thu Aug 25, 2005 8:22 pm, edited 1 time in total.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Instead of writing that yourself, you could also use simpletest browser or snoopy (both sourceforge projects)
User avatar
ConfuzzledDuck
Forum Newbie
Posts: 4
Joined: Thu Aug 25, 2005 4:21 pm
Location: Lancaster, UK

Post by ConfuzzledDuck »

Thanks for the suggestion :)

I've just had a flick though snoopy and it looks as though it would be a little like overkill for what I need, plus I don't think it would give the speed improvements I'm after because it still pulls the whole page each time.

Also I'm trying to use just one persistent connection for all the HEAD calls (sorry, I put GET in the above post, I've edited it now!) so as not to keep opening and closing connections with the remote server. I don't think snoopy can do that (please correct me if I'm wrong).

Cheers,
Jonathon
User avatar
pedrotuga
Forum Contributor
Posts: 249
Joined: Tue Dec 13, 2005 11:08 pm

Post by pedrotuga »

sorry to post on this old thread.

i am using curl as feyd described. It works i get either a 200 or a 404.
But... i tryed google for documentation on http headers and i didnt get to any easy-understandable and simple coumentation.

Ok, i got the response header, now, to check if the file exists should i just try to mach a "200" is it safe? is the first line allways in that format? i guess thats the way http works but i would like to know this for shure. Like, if i set my server to send "some weird string" on the first like of the header would the client still be able to get the file? or does it only proceeds to the download if it gets a 200?
And BTW, simple "200" string match ... is that a solution with any issues or is it ok?
ok these are dummy questions, we all use a browser everyday, pity we dont know how it works in depth, at least i dont.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Since the data returned in the above code is solely the headers, a regular expression should be sufficient to capture the value.

Code: Select all

#^HTTP/\d+\.\d+\s+2\d{2}(?!\d)#m
may do the trick.
User avatar
pedrotuga
Forum Contributor
Posts: 249
Joined: Tue Dec 13, 2005 11:08 pm

Post by pedrotuga »

working.
Thanks.

btw... can i request the filesize as well? how?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

If the page is working properly, the Content-Length header will contain the size in bytes. If it's not, you'll have to perform a strlen() on the body.
Post Reply