Page 1 of 1

problems reading sites content with curl

Posted: Mon Oct 02, 2006 8:31 pm
by visonardo
I did some function to read sites content

Code: Select all

function GetHTML($d,$method,$vars,$ref='')
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$d);
	curl_setopt($ch, CURLOPT_REFERER, $ref);
    curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch, CURLOPT_MAXREDIRS,3);
	curl_setopt($ch,CURLOPT_VERBOSE,0);   // me informará (si esta en cero) de todos los errores que halla curl
	curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
	if ($method == 'POST')
    {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
    }
    $buffer = curl_exec ($ch);
    curl_close ($ch);
    unset($ch);
    return $buffer;
}

But the problem is that when i try enter to this page that request me the url to take the content firefox showme this error message :s

Image :?



the complete code of my page is the next

Code: Select all

<?php

include("proxy.class.php");

$ws= new ws777;


if(isset($_POST['viso_url']) && !isset($_GET))
{
    global $vars,$method;
	$vars=$method=false;
	foreach($_POST as $key => $arr)
        if($key!='viso_url')
        {
            global $vandera,$vars;
            $vandera=TRUE;
            $vars.=$key.'='.$_POST[$key].'&';
        }
        else
        {
            global $dominio;
            $dominio=$_POST[$key];
        }
    if($vandera===TRUE)
        $method=POST;
}
else
{
    $method=$vars=false;
    if(isset($_GET))
    {
        global $link,$vars;
		$vars='';
        foreach($_GET as $indice => $base)
		{
			if(substr($indice,0,4)=='miro' && is_numeric(str_replace('miro','',$indice)))
			{
				global $tomar;
				$tomar=str_replace('miro','',$indice);
			}
			else
			{
				global $vars;
				$vars.=$indice.'='.$_GET[$indice].'&';
			}
		}
		//$codigo=intval($_GET['code']);
        mysql_query('DELETE FROM pagina WHERE pag_time<'.(time()-600),$link);
        if(mysql_affected_rows()==-1)
		{
            header("Location: proyecto.php");
        	exit;
		}
		global $tomar;
		$sql='SELECT * FROM pagina WHERE pag_id='.$tomar;      // SEGUIRRRRR
        if(!($query=mysql_query($sql,$link)))
        {
			header("Location: proyecto.php");
        	exit;
		}
		else
        {
            global $ref,$dominio;
            $row=mysql_fetch_assoc($query);
            $ref=trim($row['pag_viene']);
            $dominio=trim($row['pag_dire']);
			if($vars!='')
			{
				global $dominio;
				if(strpos('?',$dominio)!==false)
				{
					$dominio=str_replace('&','&',$dominio);
					$alg=(substr($dominio,-1,1)=='&')?'':'&';
					$dominio.=$alg.$vars;
				}
				else
					$dominio.='&'.$vars;
			}
					
					 
        }
    }    
    else
    {    
		echo '<div align="center">
  <form id="form1" name="form1" method="post" action="proyecto.php">
    <input name="viso_url" type="text" id="viso_url" value="http://" />
      <input type="submit" name="Submit" value="aver?" />
  </form>
  </div>';
  		exit;
	}
}


function GetHTML($d,$method,$vars,$ref='')
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$d);
	curl_setopt($ch, CURLOPT_REFERER, $ref);
    curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch, CURLOPT_MAXREDIRS,3);
	curl_setopt($ch,CURLOPT_VERBOSE,0);   // me informará (si esta en cero) de todos los errores que halla curl
	curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
	if ($method == 'POST')
    {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
    }
    $buffer = curl_exec ($ch);
    curl_close ($ch);
    unset($ch);
    return $buffer;
}


$ht=GetHTML($dominio,$method,$vars);

echo $ws->raiz($ht,$dominio);
?>


Another thing, when i tried to give an address thus "http://domine.some" and this site is www like subdomine commounly, it will redirect to www domine but when i read site content with this script i shown before the server show me a message telling me that the server has been moved <here> (with "here" like a link to www domine). I would like that this acept the redirect directly and read this content of new address redirected.
And by last, somesites detect that im an script. Im not so practic with curl and its almost the first time that i will use curl a full mode, i read the page from php about curl in curl description but in best cases it show you the constant but not their means :(


What is bad here in this code and what is lefting to simulate perfectly an user with browser? :roll:

Posted: Mon Oct 02, 2006 10:27 pm
by akimm
do a

Code: Select all

<?php
phpinfo();
?>
To make sure you do indeed have curl installed.

Posted: Tue Oct 03, 2006 4:48 am
by visonardo
akimm wrote:do a

Code: Select all

<?php
phpinfo();
?>
To make sure you do indeed have curl installed.

yes, it has. I was working perfectly but i dont know why started to showme this error message. Well, really i was adding some curl values like

Code: Select all

curl_setopt($ch, CURLOPT_MAXREDIRS,3);
        curl_setopt($ch,CURLOPT_VERBOSE,0);
that i understood or tried to fix this problem that to can read domines thus http://domine.some and accept redirectionaments too. how can i fix that?