problems reading sites content with curl
Posted: Mon Oct 02, 2006 8:31 pm
I did some function to read sites content
But the problem is that when i try enter to this page that request me the url to take the content firefox showme this error message :s
the complete code of my page is the next
Another thing, when i tried to give an address thus "http://domine.some" and this site is www like subdomine commounly, it will redirect to www domine but when i read site content with this script i shown before the server show me a message telling me that the server has been moved <here> (with "here" like a link to www domine). I would like that this acept the redirect directly and read this content of new address redirected.
And by last, somesites detect that im an script. Im not so practic with curl and its almost the first time that i will use curl a full mode, i read the page from php about curl in curl description but in best cases it show you the constant but not their means
What is bad here in this code and what is lefting to simulate perfectly an user with browser?
Code: Select all
function GetHTML($d,$method,$vars,$ref='')
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$d);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_MAXREDIRS,3);
curl_setopt($ch,CURLOPT_VERBOSE,0); // me informará (si esta en cero) de todos los errores que halla curl
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
if ($method == 'POST')
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
}
$buffer = curl_exec ($ch);
curl_close ($ch);
unset($ch);
return $buffer;
}But the problem is that when i try enter to this page that request me the url to take the content firefox showme this error message :s
the complete code of my page is the next
Code: Select all
<?php
include("proxy.class.php");
$ws= new ws777;
if(isset($_POST['viso_url']) && !isset($_GET))
{
global $vars,$method;
$vars=$method=false;
foreach($_POST as $key => $arr)
if($key!='viso_url')
{
global $vandera,$vars;
$vandera=TRUE;
$vars.=$key.'='.$_POST[$key].'&';
}
else
{
global $dominio;
$dominio=$_POST[$key];
}
if($vandera===TRUE)
$method=POST;
}
else
{
$method=$vars=false;
if(isset($_GET))
{
global $link,$vars;
$vars='';
foreach($_GET as $indice => $base)
{
if(substr($indice,0,4)=='miro' && is_numeric(str_replace('miro','',$indice)))
{
global $tomar;
$tomar=str_replace('miro','',$indice);
}
else
{
global $vars;
$vars.=$indice.'='.$_GET[$indice].'&';
}
}
//$codigo=intval($_GET['code']);
mysql_query('DELETE FROM pagina WHERE pag_time<'.(time()-600),$link);
if(mysql_affected_rows()==-1)
{
header("Location: proyecto.php");
exit;
}
global $tomar;
$sql='SELECT * FROM pagina WHERE pag_id='.$tomar; // SEGUIRRRRR
if(!($query=mysql_query($sql,$link)))
{
header("Location: proyecto.php");
exit;
}
else
{
global $ref,$dominio;
$row=mysql_fetch_assoc($query);
$ref=trim($row['pag_viene']);
$dominio=trim($row['pag_dire']);
if($vars!='')
{
global $dominio;
if(strpos('?',$dominio)!==false)
{
$dominio=str_replace('&','&',$dominio);
$alg=(substr($dominio,-1,1)=='&')?'':'&';
$dominio.=$alg.$vars;
}
else
$dominio.='&'.$vars;
}
}
}
else
{
echo '<div align="center">
<form id="form1" name="form1" method="post" action="proyecto.php">
<input name="viso_url" type="text" id="viso_url" value="http://" />
<input type="submit" name="Submit" value="aver?" />
</form>
</div>';
exit;
}
}
function GetHTML($d,$method,$vars,$ref='')
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$d);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_MAXREDIRS,3);
curl_setopt($ch,CURLOPT_VERBOSE,0); // me informará (si esta en cero) de todos los errores que halla curl
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,0);
if ($method == 'POST')
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
}
$buffer = curl_exec ($ch);
curl_close ($ch);
unset($ch);
return $buffer;
}
$ht=GetHTML($dominio,$method,$vars);
echo $ws->raiz($ht,$dominio);
?>Another thing, when i tried to give an address thus "http://domine.some" and this site is www like subdomine commounly, it will redirect to www domine but when i read site content with this script i shown before the server show me a message telling me that the server has been moved <here> (with "here" like a link to www domine). I would like that this acept the redirect directly and read this content of new address redirected.
And by last, somesites detect that im an script. Im not so practic with curl and its almost the first time that i will use curl a full mode, i read the page from php about curl in curl description but in best cases it show you the constant but not their means
What is bad here in this code and what is lefting to simulate perfectly an user with browser?