Page 1 of 1

curl and Proxies - problem with follow links

Posted: Thu Jul 09, 2009 6:30 am
by DaQuark
Hi there,

I want to use curl with the proxy feature. I've implemented the script below successfully but there is one issue with it. The test.php is under the radar (using a proxy) but not the following links / iframes, etc. I'm using a link on test.php - when I click on it, it's not using the proxy. I'm importing an iframe (from another server) - it's not using the proxy.

How can I implement the script that EVERYTHING is using a proxy? That everything goes through the proxy (doesn't matter whether it's a link or an iframe or coming from somewhere)

Thanks a lot for help!

Code: Select all

 
<?
    include("connect_database.php"); 
 
        // select random proxy from database
        $result = mysql_query("SELECT server_ip, server_port FROM proxies ORDER by RAND() LIMIT 1");
        $proxy_list = mysql_fetch_assoc($result);
    
        $proxy = $proxy_list['server_ip'] . ":" . $proxy_list['server_port'];
 
    $ch = curl_init("http://www.xypage.com/test.php");
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)");
    curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_PROXY, $proxy);
    $data = curl_exec($ch);
    
        echo $data;
?>
 

Re: curl and Proxies - problem with follow links

Posted: Thu Jul 09, 2009 8:12 am
by Eric!
I haven't done this through a proxy, so this is just a guess. Try adding a

Code: Select all

curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, true);
before setting the proxy.

Re: curl and Proxies - problem with follow links

Posted: Thu Jul 09, 2009 11:27 am
by DaQuark
ok I've tried it. I also tried to put the

Code: Select all

 curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, true);
in different places (lines). But it doesn't seem to work. The page test.php doesn't get displayed anymore. All the graphics are gone, all the text - seems nothing gets displayed. I also tried using 1 instead of true.

When I remove the line all works like before but still the issue with the link and iframes. Any other idea?

Re: curl and Proxies - problem with follow links

Posted: Thu Jul 09, 2009 12:35 pm
by Eric!
I don't know why the images were dropped.

When I get stuck with things like this I try a html proxy tool that can capture the data like webscarab. With the tool running to capture your browser data, run it like you want curl to run. Then set that captured data aside and try the php script and capturing the html data. Then compare the two sessions to get an idea of where curl is going off track.

In the end it might be a case where you'll have to parse the iframe url and curl it seperately through the proxy...I don't know.

Re: curl and Proxies - problem with follow links

Posted: Thu Jul 09, 2009 4:28 pm
by DaQuark
the problem is that just not the images were dropped - everything got dropped. So I don't see any content from the test.php. Sorry my misleading.

Anyone have done this before? Could it be the proxy I'm using? I will try to use anohter proxy right now but don't think that this will solve the problem ...

Re: curl and Proxies - problem with follow links

Posted: Thu Jul 09, 2009 5:56 pm
by DaQuark
Now I'm using/trying Snoopy but it doesn't seem to work too. When I get on the test.php page, I'm on the proxy's IP. But as soon as I click a link, I'm not using the proxy's IP anymore. Is that a general problem - is that even possible? Or am I doing something really stupid/wrong?

All I would like to do is: tunnel a page under the security of a proxy. And it should all happen in PHP.

Snoopy test:

Code: Select all

include("connect_database.php"); 
 
    $result = mysql_query("SELECT server_ip, server_port FROM proxies ORDER by RAND() LIMIT 1");
    $proxy_list = mysql_fetch_assoc($result);
    
    $proxy = $proxy_list['server_ip'] . ":" . $proxy_list['server_port'];
 
    include "Snoopy.class.php";
    $snoopy = new Snoopy;
    
    $snoopy->proxy_host = $proxy_list['server_ip'];
    $snoopy->proxy_port = $proxy_list['server_port'];
    
    $snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)";
    $snoopy->referer = "http://www.google.com/";
    
    if($snoopy->fetch("http://www.xypage.com/test.php"))
    {
        echo $snoopy->results;
    }
    else
        echo "error fetching document: ".$snoopy->error."\n";

Re: curl and Proxies - problem with follow links

Posted: Fri Jul 10, 2009 10:50 am
by DaQuark
hm strange: can't get it to work. I'm almost to the point that I use a proxy script like phproxy. But then I would have to massive change that script and remove the form and stuff like that.

Does anyone had the same problem before and a good solution for all my issues?

p.s.: the problem is definitly "CURLOPT_HTTPPROXYTUNNEL": when I set it to true, the entire page doesn't show up and stays blank

Re: curl and Proxies - problem with follow links

Posted: Fri Jul 10, 2009 3:23 pm
by Eric!
DaQuark wrote:But as soon as I click a link, I'm not using the proxy's IP anymore.
Be patient with me...I'm still trying to understand your problem. I thought the problem was your html was empty for IFRAME URLS when using a proxy because curl or snoopy wasn't following the links, but it sounds like that isn't the case because you're actually having your browser follow the links, not your script. Is that right?

If you redisplaying the fetched HTML to a browser and then clicking on the link, then I think this is how it is supposed to work because the links are shown to the browser as they are written on the page. If your browser isn't routed through the proxy, it is going to just go directly to whatever link you tell it.

Before you display the fetched html data, you have to parse the URLS links and change them so they are passed back to your script rewritten like:

Code: Select all

<a href="http://mydomain.com/myscript.php?target=original.url.on.the.page.com">blah blah blah </a>
This will redirect your browser to your script with the URL you want to fetch then you can fetch the clicked link again through the proxy just like you did the first page.

This might be a crude hack to solve the problem, perhaps there is a cleaner way to do it, but I don't know how. This will only work for a href tags, other link types (redirects, javascripts) might be a bit more complicated to modify and insure that they are routed through the proxy.