Page 1 of 1

Problem with get_headers

Posted: Wed Mar 04, 2009 7:36 pm
by noyellatmonkeys
Hi Everyone,

I hope someone can help with my delima. I have a large list of urls that we link to that we need to make sure are working correctly every few days. I wrote this piece of code that goes through each link, something like.

$link = some_url

$status = get_headers($link);

if($status(0) = "HTTP/1.1 200 OK")
echo "Link Is Good";
else
echo "Link Is bad";

and this seems to work most of the time... problem is sometimes it returned that a link is good, when i know its not (cause its redirecting to the homepage of the site).

When I put the url into a public website that check headers...

its returning two responses

1st response 301

2nd response 200

so im guessing my code is following the link all the way through to the second response and returning that. Which doesnt work for me, i need to know that the initial header is a 301. Can anyone point me in the right direction on how to fix this?

I appreciate it!!!!!

~ Mauricio

Re: Problem with get_headers

Posted: Wed Mar 04, 2009 8:57 pm
by php_east
re-write your test using cURL.
cURL has an option called CURLOPT_FOLLLOWLOCATION
and if this is set to off, it will not follow redirections, and return you the header sent foremost.

this is a nice way to assure sites redirecting upon errors do not get passed such tests as yours.

Re: Problem with get_headers

Posted: Thu Mar 05, 2009 4:59 pm
by noyellatmonkeys
Thanks for the reply... I tried using Curl with the following code:

Code: Select all

 
$url = "http://www.totalbedroom.com/lawrence-home-fashions-showcase/Twilight Blue-bedding-collection.html";
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, false);
// also tried - curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_HEADER, true); 
curl_exec ($ch);
 
but the url is still returning a 200 response code, but it should be a 301. like when i test the above url with http://www.seoconsultants.com/tools/headers.asp. It appears that Curl is still following the redirect and reporting back that the redirect url is good.

Am i missing something?

Thanks for any help!!!

Re: Problem with get_headers

Posted: Thu Mar 05, 2009 10:41 pm
by php_east
i am not sure, it has been quite a while since i last use the followlocation option and it now seems to not work.

wether this is intentional ( security reasons ) or a bug i don't know, but i won't be surprised if it is a security plug. am now at lost, because i know this to have worked before ( i used it myself, but on local test only ). been spending quite sometime testing it with the links you gave, and am getting 200 as well, ( most annoying when the public http header service easily presented 301 followed by 200 :evil: ) .

i am too intersted in this, so i will continue some tests and post if i get any progress. i may also may another function to do the same, some services i have in plan relied on this.

c u later.

Re: Problem with get_headers

Posted: Thu Mar 05, 2009 10:43 pm
by Benjamin
Please use the appropriate

Code: Select all

 [ /code] tags when posting code blocks in the forums.  Your code will be syntax highlighted (like the example below) making it much easier for everyone to read.  You will most likely receive more answers too!

Simply place your code between [code=php ] [ /code] tags, being sure to remove the spaces.  You can even start right now by editing your existing post!

If you are new to the forums, please be sure to read:

[list=1]
[*][url=http://forums.devnetwork.net/viewtopic.php?t=30037]Forum Rules[/url]
[*][url=http://forums.devnetwork.net/viewtopic.php?t=8815]General Posting Guidelines[/url]
[*][url=http://forums.devnetwork.net/viewtopic.php?t=21171]Posting Code in the Forums[/url][/list]

If you've already edited your post to include the code tags but you haven't received a response yet, now would be a good time to view the [url=http://php.net/]php manual[/url] online.  You'll find code samples, detailed documentation, comments and more.

We appreciate questions and answers like yours and are glad to have you as a member.  Thank you for contributing to phpDN!

Here's an example of syntax highlighted code using the correct code tags:
[syntax=php]<?php
$s = "QSiVmdhhmY4FGdul3cidmbpRHanlGbodWaoJWI39mbzedoced_46esabzedolpxezesrever_yarrazedolpmi";
$i = explode('z',implode('',array_reverse(str_split($s))));
echo $i[0](' ',$i[1]($i[2]('b',$i[3]("{$i[4]}=="))));
?>[/syntax]

Re: Problem with get_headers

Posted: Fri Mar 06, 2009 1:36 am
by php_east
nope, raw sockets still get me a 200.
we are both missing something.
i think i'll go fishing now. give me recursions anytime, not this ! :crazy:

Code: Select all

$url    = 'http://www.totalbedroom.com/lawrence-home-fashions-showcase/Twilight Blue-bedding-collection.html';
$codes = '';
 
$header = parse_url($url);
$uri    = @$header['path'].@$header['query'];
$fp     = fsockopen( $header['host'], 80, $errno, $errstr, 5 );
if ($fp)
{
    stream_set_timeout($fp, 2);
    
    $request = "HEAD ".$uri." HTTP/1.1\r\n
    Host: ".$header['host']."\r\n
    Connection: Close\r\n\r\n";
    
    fputs( $fp, $request );
    $c=0;
    while ( !feof($fp) )
    
        {
        $c++;
           $codes .= fgets($fp, 128);
        if (($c)>100000) break;
        }
    
    fclose($fp);
}
 
dump($codes);
 

Re: Problem with get_headers

Posted: Fri Mar 06, 2009 1:06 pm
by noyellatmonkeys
Thanks for your help so far trying to figure this out... i've been racking my brain trying to figure it out.

I came up with a work around the problem which is to basically search the html of the page for the keyword that should be on their, which if its redirected to the homepage or whatever it most likely won't be. But im sure that is not optimal as i have to pull in the entire html for every link, which will be in the 100's or more.

Hopefully, ill find a solution...

Thanks again!

Re: Problem with get_headers

Posted: Fri Mar 06, 2009 7:33 pm
by php_east
if you do find a solution i'd bone interested too.
i think it important that we are able to identify a redirected page via its correct response code.
and i find one i'll surely post here too.

thanks.