Page 1 of 1

checking if files exist

Posted: Sat Nov 13, 2004 1:37 pm
by mistwist
I run a link site, ppl submit their pages(always just a single page at a time) to me to post on my website, but before i post them I want to check for for broken images on that page. I have no way of knowing the name of the images tho they will always be jpg files. i know its possible to do but i am having no luck figuring out how to do it.

any help?

Posted: Sat Nov 13, 2004 3:06 pm
by Weirdan
Get the page into variable using [php_man]curl[/php_man], then use [php_man]preg_match_all[/php_man] to get all <img> tags and then run curl for each found image to check if all of them produce HTTP 200 OK response.

Posted: Sat Nov 13, 2004 4:08 pm
by mistwist
nope that won't work.. server does not have curl mod loaded :(

got me all excited...lol

Posted: Sat Nov 13, 2004 6:41 pm
by rehfeld

Code: Select all

$fp = fopen($url, 'r');
fclose($fp);

print_r($http_response_header);

Posted: Sat Nov 13, 2004 7:07 pm
by mistwist
rehfeld,
Doesnt t that just return the header info of the url itself? type, server 404, 200 etc

I want to check for broken images on the supplied url...

Posted: Sat Nov 13, 2004 10:20 pm
by rehfeld
mistwist wrote:rehfeld,
Doesnt t that just return the header info of the url itself? type, server 404, 200 etc

I want to check for broken images on the supplied url...
well you said you didnt have curl, so this is another way to check the headers

just check for 200ok on the image urls

Posted: Sun Nov 14, 2004 5:07 pm
by mistwist
ok i see where your going... But unfortunately it won't work, this is part of a form and the form does not require the image names to be entered. so it has no way of knowing the image names..

Posted: Sun Nov 14, 2004 10:15 pm
by rehfeld
well you have to fetch the page, then get all the image tags, and then check each image

you still need to resolve relative urls into absolute urls so you can check them, which isnt very hard. youll also nee to check the server response, but this should get you going






Code: Select all

<?php


function extract_img_tags($document)
{
    $document = strtolower($document);
    $img_tags = array();
    $pointer = 0;
    
    while (false !== ($open_pos = strpos($document, '<img', $pointer))) {
        $close_pos   = strpos($document, '>', $open_pos) + 1;
        $tag_length  = $close_pos - $open_pos;
        $pointer     = $close_pos;
        $img_tags[]  = substr($document, $open_pos, $tag_length);    
    }

    return $img_tags;
}

function extract_img_url($tag)
{
    $tag = strtolower($tag);
    $url = false;

    if (false !== ($start_pos = strpos($tag, 'src="'))) {
        $url_begin  = $start_pos + 5;
        $url_end    = strpos($tag, '"', $url_begin);
        $url_length = $url_end - $url_begin;
        $url        = substr($tag, $url_begin, $url_length);
    } elseif (false !== ($start_pos = strpos($link, "src='"))) {
        $url_begin  = $start_pos + 5;
        $url_end    = strpos($tag, "'", $url_begin);
        $url_length = $url_end - $url_begin;
        $url        = substr($tag, $url_begin, $url_length);
    }

    return $url;
}






$page = 'http://cnn.com';


$document = @file_get_contents($page);

$img_tags = extract_img_tags($document);

$img_urls = array();
foreach ($img_tags as $tag) {
    $img_urls[] = extract_img_url($tag);
}



print_r($img_urls);





?>