Page 1 of 1

Why 400 Error?

Posted: Thu Jun 15, 2017 7:30 pm
by UniqueIdeaMan
Folks,

Why do I keep on getting this error when everytime I input a url in the url field in the following web proxy:

**"The specified URL could not be returned due to a status code of 400."**

Code: Select all


<?php
	error_reporting(0);

	session_start();

        //Settings Instructions: https://darkpolitics.wordpress.com/2009/12/29/create-your-own-web-proxy-server/

	// turn debug messages on when debugging your proxy
	//$DEBUG = true;
	$DEBUG = false;

	// set this to the location of the webproxy page if you know where its going to be otherwise this function will work it out.
	// for performance you should hardcode this to your webproxy location
	//$PROXYURL = "http://www.mysite.com/myproxy.php";
	$PROXYURL = get_current_location(); // works out current scripts location

	// urls from orig search will be $_POST but then future links we proxify will be $_GET
	$url = $_REQUEST["url"];
	$useragent = $_POST["useragent"]; // will only be a POST from search form

	ShowDebug("useragent posted from search form = $useragent");
		
	// set the user-agent we will surf with. We only set on initial search and then use a session to pass this var to any
	// other content passed through the proxy. Make sure you have session cookies enabled for your proxy page!
	if(!empty($useragent)){
		if($useragent=="us"){
			$surf_useragent  = $_SERVER["HTTP_USER_AGENT"]; // use current agent
		}else if($useragent=="ie"){
			$surf_useragent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";		// use IE 7
		}else{ // must be ff as we only have 2 choices!! Add as required
			$surf_useragent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)"; // use FF3
		}
		// set a session for future calls through the proxy
		$_SESSION["surf_useragent"] = $surf_useragent;
	}else{
		$surf_useragent = $_SESSION["surf_useragent"];
	}

	ShowDebug("surf with agent = $surf_useragent");

	$err = false;
	$msg = "";
	$content = "";
	$subpathurl ="";
	$pathurl = "";
	$siteurl = "";

	// this list contains domains that this proxy will allow obviously in your own proxy you can remove this!!
	$whitelist = "technicallypolitical.com,strictly-software.com,infowars.com,prisonplanet.com,hashemian.com";
	$cansearch = false;

	ShowDebug("url = $url");
	ShowDebug("useragent = $useragent");
	ShowDebug("PROXYURL = $PROXYURL");

	if(!empty($url)){
				
		ShowDebug("url = $url");

		// make sure its valid with a protocol at the start
		if($url == "http://"){
			$err = true;
			$msg = "Please specify a full URL to access e.g http://www.darkpolitricks.com";
		}else if(!preg_match("/https?:\/\//",$url)){
			$err = true;
			$msg = "Please specify the protocol within the URL e.g http://";

			ShowDebug("error = $msg");
		}else{
			
			ShowDebug("get content from remote url $url");

			if(!empty($whitelist)){
				// check whether url is allowed
				$allowed = explode(",",$whitelist);
				$count = count($allowed); 
				$lowurl = strtolower($url);

				ShowDebug("check whether $lowurl is in whitelist of $whitelist");

				foreach($allowed as $val){
					ShowDebug("check whether ".$val." is in $url");

					if( strripos($lowurl, $val) !== false){
						ShowDebug("This url $url is on whitelist matching $val");
						$cansearch = true;
						break;
					}
					
				}
			}else{
				$cansearch = true;
			}

			if(!$cansearch){
				$err = true;
				$msg = "The url is not allowed to be accessed from this web proxy server.";
			}else{
		
				// crawl item e.g URL, script, CSS, image
				$html = mycrawler_single($url,$surf_useragent);

				$content = $html["html"];
				$status = $html["status"];
				$headers = $html["header"];
				$content_type = $html["content_type"];
				$connect_error = $html["message"];

				ShowDebug("connect error = $connect_error");
				ShowDebug("status = $status");

				// a status code 200 means we got a successful request back if we didn't then we have an issue
				if($status!="200"){
					
					// 404 = Page not found
					if($status=="404"){
						$err = true;
						$msg = "The specified URL could not be located.";
					}else if(!empty($connect_error)){
						$err = true;
						$msg = $connect_error;

						ShowDebug("CONNECT ERROR = $connect_error; msg = $msg");
					}else{
						$err = true;
						$msg = "The specified URL could not be returned due to a status code of $status.";
					}

				}else{

					// need to replace all links in our returned content with links to the proxy so that future clicks are proxified
					$urlinfo = parse_url($url);

					// get root url to exend any relative links e.g http://www.mysite.com
					$siteurl = $urlinfo["scheme"]."://".$urlinfo["host"];
					if(!empty($urlinfo["path"])){
						$pathurl = $siteurl.$urlinfo["path"]; 

						// make sure file is removed in case we need current sub directory
						$pospath = strripos($pathurl, "/");
						
						if($pospath!==false){

							ShowDebug( "take up to / as pos $pospath in $pathurl<br />");

							$subpathurl = substr($pathurl,0,$pospath);
						}else{
							$subpathurl = $pathurl."/";
						}
					}else{
						$pathurl = $siteurl;
						$subpathurl = $pathurl."/";
					}

					ShowDebug("SiteURL = $siteurl path = $pathurl");

					// for text related content we scan for links so that we can change them all to go through our proxy
					// for images and other non textual content we have no need to change the links
					if(preg_match("/(text|html|xml|xhtml|css|javascript)/i", $content_type )){
					//if(preg_match("/(text|html|xml|xhtml)/", $content_type )){
						
						ShowDebug("parse links");

						// make sure all links are rerouted through proxy
						$content = reformat_links($content,$siteurl,$subpathurl);

					}

					// As all links/src values from the page we visit need to pass through the proxy as well we need to ensure
					// to output the correct header for file. For example a PNG image needs to have the correct header e.g image/png

					ShowDebug("output content-type: $content_type");

					header( $content_type );

					ShowDebug("output content = $content");

					// output content to screen
					echo $content;
				}
			}
		}
	}else{
		// default url to http://
		$url = "http://";
	}

	// Will return the current location of the script running. If the proxy page is moved around a lot then this
	// will work out where it is but for performance set the value at the top in $PROXYURL
	function get_current_location(){

		$url = "";

		if( $_SERVER["SERVER_PORT"]== 443){
			$protocol = "https://";
		}else{
			$protocol = "http://";
		}

		$url = $protocol . $_SERVER["SERVER_NAME"] . $_SERVER["SCRIPT_NAME"];

		return $url;
	}


	// retrieve link destinations and modify them so that when they are clicked the content is passed through the proxy
	// as well. I look for src/href tags. Currently this does not handle URLs defined like so href="../"
	function reformat_links($content,$siteurl,$subpathurl){ 
		// need to make all URLs go through our proxy! use ISAPI rewriting to make it nicer this is just a guide
		global $PROXYURL;

		$relurl = $PROXYURL . "?url=" .$siteurl; // for urls like url="/sub/page.htm"
		$cururl = $PROXYURL . "?url=" .$subpathurl; // for urls like url="page.htm"
		$absurl = $PROXYURL . "?url=";  // for urls like url="http://www.mysite.com/page.htm"

		ShowDebug("reformat rel urls = $relurl");
		ShowDebug("reformat cur urls = $cururl");
		ShowDebug("reformat abs urls = $absurl");

		$newcontent = $content;

		// get all links and reformat
		// as we don't want to do the same links multiple times which happens I use placeholders first and then
		// once every possible location has been marked I insert the link to the proxy
	
		// look for absolute urls e.g url="http://www.mysite.com/blah.asp"
		$newcontent = preg_replace("/((?:href|src)=['\"])(http.*?)(['\"])/i","$1##ABSURL##$2$3",$newcontent);

		// get links starting with / e.g url="/sub/page.htm"
		$newcontent = preg_replace("/((?:href|src)=['\"])(\/.*?)(['\"])/i","$1##RELURL##$2$3",$newcontent);

		// get links starting like url="page.htm"
		$newcontent = preg_replace("/((?:href|src)=['\"])([^#h\/][^#t][^t][^p].*?)(['\"])/i","$1##CURURL##$2$3",$newcontent);
		
		// now replace placeholders 
		$newcontent = str_replace("##RELURL##",$relurl,$newcontent);	
		
		$newcontent = str_replace("##CURURL##",$cururl,$newcontent);	
		
		$newcontent = str_replace("##ABSURL##",$absurl,$newcontent);				

		ShowDebug("return content");

		return $newcontent; 
	} 

	
	// code to load remote content such as HTML files, CSS, Images etc
	// To follow more than 3 redirects (e.g ISAPI rewrites then change $maxredirs=XX)
	function mycrawler_single($url, $useragent="",$timeout=10, $maxredirs=3) 
	{
		ShowDebug( "IN mycrawler_single Get URL content from $url $useragent maxredirs = $maxredirs");
		
		$urlinfo = parse_url($url);
					 
		if (empty($urlinfo["scheme"])) {$urlinfo = parse_url("http://".$url);}                                                                  
		if (empty($urlinfo["path"])) {$urlinfo["path"]="/";}
				  
		if (empty($urlinfo["port"]))
		{
				switch($urlinfo["scheme"])
				{
					case "http":
						$urlinfo["port"] = 80;
						break;  
					case "https":
						$urlinfo["port"] = 443;
						break;                
				}
		}

		// if no agent is supplied use default agent
		if (empty($useragent)) $useragent = $_SERVER["HTTP_USER_AGENT"];

		ShowDebug("useragent to use = $useragent");

		if (isset($urlinfo["query"]))
		{
			$request = "GET ".$urlinfo["path"]."?".$urlinfo["query"]." ";
		} else {   
			$request = "GET ".$urlinfo["path"]." ";
		}
		
		// form request
		$request .= "HTTP/1.0\r\n";
		$request .= "Host: ".$urlinfo["host"]."\r\n";
		$request .= "User-Agent: ".$useragent."\r\n";
		$request .= "Connection: close\r\n\r\n";
		
		ShowDebug( "request = ".$request);

		ShowDebug( "open ".$urlinfo["host"].":".$urlinfo["port"]);

		$fp = @fsockopen($urlinfo["host"], $urlinfo["port"], $errno, $errstr, $timeout);

		if (!$fp)
		{
			ShowDebug( "ERROR! (".$errno.")".$errstr);

			$urlinfo["header"] = "";
			$urlinfo["html"] = "Error: $errno $errstr"; 
			$urlinfo["status"] = 400; // bad request
			$urlinfo["content_type"] = "";
			$urlinfo["message"] = "The request could not be made. $errno $errstr";

			return $urlinfo;  
		}
		else
		{   
			ShowDebug($request);

			fwrite($fp, $request);
			
			while (!feof($fp)) 
			{
				if(isset($data)){
					$data .= fgets($fp, 4096);  					
				}else{
					$data = fgets($fp, 4096);
					ShowDebug( "take status code from 9,4 in data = ".$data);
					
					// status code should be here! if not its a bad request
					$code = trim(substr($data,9,4));					
					ShowDebug( "Status Code = ".$code);					
				}
			}
			
			ShowDebug( "Status Code = ".$code);	

			// if no status code default to 400 = Bad Request
			if(empty($code) || !is_numeric($code)){

				$code = 400;

				ShowDebug("default to bad request 400");
			}

			ShowDebug("status code = $code - response = $data");

			fclose($fp);   
						
			$tmp = explode("\r\n\r\n", $data, 2);
			
			// We will return an array with these parts header, html, status code and content-type
			$urlinfo["header"] = $tmp[0];
			$urlinfo["html"] = $tmp[1]; 
			$urlinfo["status"] = $code;
			$urlinfo["content_type"] = get_content_type($tmp[0]);
			$urlinfo["message"] = "";

			ShowDebug( "The Status Code = ".$urlinfo["status"]." from header: ".$urlinfo["header"]);
			
			// handle redirects
			ShowDebug( "do we need to redirect? pos of location in header = ". stripos($urlinfo["header"], "location:"). " maxredirs = $maxredirs");

			if ((stripos($urlinfo["header"], "location:")) && ($maxredirs > 0))
			{
				ShowDebug( "found location in header and we CAN REDIRECT");
				
				preg_match("/\r\nlocation:(.*)/i", $urlinfo["header"], $match);

				if ($match)
				{    
					$redirect = trim($match[1]);
					
					ShowDebug( "Redirecting to ".$redirect);
					ShowDebug( "$maxredirs is currently $maxredirs");

					$maxredirs--;                         
					
					ShowDebug( "$maxredirs after count down is now $maxredirs");

					ShowDebug( "DO A REDIRECT TO $redirect");

					return mycrawler_single($redirect, $useragent, $timeout, $maxredirs);
				}
			}       

			ShowDebug( "RETURN FROM mycrawler_single");

			// return array of header/html
			return $urlinfo;          
		}        
	}
	
	// will check headers for the content-type. We need this so that images are displayed correctly
	function get_content_type($headers){
		$content_type = "";

		if(!empty($headers)){
			$headerarray = explode("\r\n", $headers);
			foreach($headerarray as $head){
				
				ShowDebug( "header item = ".$head);

				if(preg_match("/Content-Type: .+$/i",$head)){
					$content_type = $head;
					break;
				}				
			}
		}

		ShowDebug("return $content_type");

		return $content_type;
	}

	// Debug function if you want to show debug e.g for testing your proxy then turn $DEBUG = True at top of page
	// for performance all ShowDebug statements should be removed on production to reduce unneccessary function calls
	function ShowDebug($msg){
		global $DEBUG;
		if(!$DEBUG) return;
		if(!empty($msg)){
			echo htmlentities($msg)."<br />";
		}
	}

if(empty($url) || $url=="http://" || $err){
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
<head>
<title>Dark Politricks Web Proxy Example</title>
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
<meta name="keywords" content="DarkPolitricks, WebProxy, Proxy, Proxies, Proxi, Proxied, Forwarded-For" /> 
<meta name="description" content="An example of a web proxy, how you can make your own web proxy to bypass basic filtering" /> 
<!-- Put all these in an external stylesheet -->
<style>
	body{background:lightblue;}
	p{font-weight:bold;}
	.error{color:red;}
	.msg{color:green;}
	#main{margin:auto;width:600px;}
	#search{margin:auto;width:600px;}
	label{font-weight:bold;font-face:Tahoma,Arial;}
	#url{width:300px;}
	#searchflds{border:1px solid black;}
	dt{float:left;}
	dd{float:left;}
	#domainlist{font-style:italic;color:navy;}
	#searchbutton{text-align:right;}
	#agent{clear:both;}
	.agent{margin-top:10px;}
	#ie{margin-left:-12px;}
</style>
</head>
<body>

	<div id="main">
		<h1>Example of a WebProxy</h1>

		<?php
		if(!empty($msg)){
			if($err){
				echo "<p class='error'>$msg</p>";
			}else{
				echo "<p class='msg'>$msg</p>";
			}
		}
		?>

		<p>This is an example page and can only be used to access the following domains:</p>
		<p id="domainlist">technicallypolitical.com, strictly-software.com, infowars.com, prisonplanet.com</p>

		<p>Please read the related article at <a href="http://www.darkpolitricks.com/2009/12/create-your-own-web-proxy-server" title="Create your own web proxy">www.darkpolitricks.com</a> to get more information as well as a link to download the code so that you can create your own web proxy.</p>

		<div id="search">
			<form id="searchanon" name="searchanon" method="POST">
				<fieldset id="searchflds">
					<dl>
						<dt><label for="where">Where To</label></dt>
						<dd><input type="text" id="url" name="url" value="<?php echo $url ?>" maxlength="100" />
					</dl>
					<dl id="agent">
						<dt class="agent"><label for="useragent">User-Agent</label></dt>
						<dd class="agent"><input type="radio" name="useragent" id="ie" value="ie" <?php if($useragent=="ie"){ echo 'checked="true"'; } ?> /><label for="ie" title="Use IE 7 user-agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)">IE 7</label>
							<input type="radio" name="useragent" id="ff" value="ff" <?php if($useragent=="ff"){ echo 'checked="true"'; } ?> /><label for="ff" title="Use FireFox 3 user-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 (.NET CLR 3.5.30729)">FireFox 3</label>
							<input type="radio" name="useragent" id="us" value="us" <?php if($useragent=="us"){ echo 'checked="true"'; } ?> /><label for="ff" title="Keep existing agent: <?php echo $_SERVER["HTTP_USER_AGENT"] ?>">Keep Existing User-Agent</label>
							</dd>
					</dl>
				</fieldset>
				<p id="searchbutton"><input type="submit" value="Go There" id="submitsearch" name="submitsearch" />
			</form>
		</div>
	</div>
</body>
</html>
<?php
}
?>

And, how do I remove the restrictions so any website can be viewed apart from:

Code: Select all


$whitelist = "technicallypolitical.com,strictly-software.com,infowars.com,prisonplanet.com,hashemian.com";

I removed the above mentioned urls from the $whitelist and it worked as I was able to view google but then the 400 error started appearing.
How would you change the code and where ?

Re: Why 400 Error?

Posted: Fri Jun 16, 2017 7:07 pm
by Christopher
400 means there is an error in the URL. What is the URL giving the error?