PHP: Using remote proxies - Unreliability issues

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
krt
Forum Newbie
Posts: 1
Joined: Thu May 04, 2006 7:25 am

PHP: Using remote proxies - Unreliability issues

Post by krt »

** Background Info **

I have a function that fetches content from a page using a random remote proxy. The proxy list is updated daily and there should be no connectivity issues. If a proxy fails $contents returns false and the calling page decides how many times it should retry fetching that page with a different proxy. That is all fine.

** The problem **

The function fails too many times, The calling script reports several "Too many proxies tried" yet a fraction of pages are fetched successfully. Can you suggest a possible cause for this?

Any help will be greatly appreciated and hopefully repaid :)

** The PHP code **

Code: Select all

<?php

function get_new_proxy()
{
    // all you need to know is this function
    // gets a proxy in the format array([URL], [port])
}

// Fetch page, returns content and headers
function fetch($host, $url)
{
    global $dir;
    global $retries;
    static $current_proxy_fetches;
   
    // Attempt to connect to the proxy server to retrieve the remote page
    if (!@$current_proxy_fetches || $current_proxy_fetches++ > 10) {
        $current_proxy_fetches = 0;
        if (!ereg("-noproxy-?", $modifiers))
            list($proxy_address, $proxy_port) = get_new_proxy();
        if (!$socket = @fsockopen($proxy_address, $proxy_port, $errno, $errstring, 20)) {
            $filename = "{$dir['data']}/proxy_blacklist.txt";
            $fp = fopen($filename, 'a+');
            fwrite($fp, date("d/m/y H:i") . " $proxy_address:$proxy_port")
                or log_error("Could not write to file '$filename'");
            fclose($fp);
            $retries++;
            if ($retries < 3) {
                list($_proxy_address, $_proxy_port) = get_new_proxy();
                $contents = fetch($_domain, $_path, $_proxy_address, $_proxy_port);
                return $contents;
            }else{
                $retries = 0;
                return false;
            }
        }
        $current_proxy_fetches++;
    }
   
    // If socket connection successful, reset retries counter
    $retries = 0;

    // HTTP commands
    $headers  = "GET $url HTTP/1.1\r\n";
    $headers .= "Host: $host\r\n";
    $headers .= "Connection: Close\r\n";
    $headers .= "\r\n";
   
    // Init. $contents var
    $contents = "";
   
     // Get the contents
    if ($socket) {
        fwrite($socket, $headers);
       
        while (!feof($socket)){
            $contents .= fgets($socket, 128);
        }
       
        fclose($socket);
    }
   
    /* Contents contains both the html headers and the html of the page. */
    return $contents;
}

?>
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

hrm, fetching a remote page through a proxy? Seems kinda suspicious.

anyhoo, turn error reporting on (if it isn't already)

Code: Select all

ini_set('display_errors','On');
error_reporting(E_ALL);
and remove the @'s from your script.

look for some errors.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
Roja
Tutorials Group
Posts: 2692
Joined: Sun Jan 04, 2004 10:30 pm

Post by Roja »

As scottayy mentions, for anything automated that fails, run it manually, watch the output, and you'll have your answers.

Turn off error reporting, run it manually a few dozen times, and you'll know where the failures are.

Most likely its due to networking issues, or the (un)reliability of public proxies.
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

scottayy wrote:hrm, fetching a remote page through a proxy? Seems kinda suspicious.
Not really... depending upon how your network is configured the only gateway to the internet may be via a proxy ;)
Post Reply