Page 1 of 1

speeding up fopen?

Posted: Wed Apr 12, 2006 3:45 am
by malcolmboston
as many of you will know im currently building a list that goes to a URL and checks if its valid or not, and then gets the source code and checks if a link back to my clients site exists in that source code.

Its all working quite nicely now, however ive done some timing functions and on average these 3 actions take between 0.9 - 1.2s

Now first of all i used classes, i thought i would try it just using only function but the difference in speed that is reported is pretty much non-existant.

Now, the problemi am having is that my client has a huge live database that im getting these URL's from and yesterday it took 3 hours just to complete 77% of the script, he aint happy and ive tried to explain that PHP is not exactly the best language to be doing this in but still....

He pointed me to a script here that does what mine exactly what mine does but produces 10 results a tad faster than mine, obviously using pagination is the key but hes trying to get away from that, now obvious mathematics quote that it would take my script roughly 10s to do the same

Ive went through every optimisation in the book and i dont seem to be getting it done any faster

Any ideas?

my source code is

Code: Select all

<?php
error_reporting(E_ALL);
set_time_limit(0);

// this class sometimes (rarely) verifies a URL as invalid when it isnt
// i would always manually check this

class getURLInformation
{
	var $URL;
	var $backLink;
	
	function getURLInformation ($URL, $backLink) {
		// check to see if the URL is a valid one
		$this->URL = $URL;
		$this->backLink = $backLink;
		$this->validURL = $this->checkValidURL();
		// do a conditional get source
		if ($this->validURL === TRUE) {
			// get the source code
			$this->checkBackLink ();
		} else {
			$this->hasBackLink = "NO";
		}
	}
	
	function checkValidURL () {
		$this->handle = @fopen($this->URL, "r");
		if (!$this->handle) {
			return FALSE;
		} else {
			$this->sourceCode = '';
			while (!feof($this->handle)) {
				$this->sourceCode .= fread($this->handle, 8192);
			}
			$this->sourceCode = $this->sourceCode;
		}
		return TRUE;
	}
	
	function checkBackLink () {
		if (strpos($this->sourceCode, $this->backLink)) {
			$this->hasBackLink = TRUE;
		} else {
			$this->hasBackLink = FALSE;
		}
	}
	
}

function getPercentComplete ($total, $current) {
	$percent = ($current / $total) * 100;
	$percent = number_format($percent, 1);
	return $percent;
}

function getMicrotime () {
	list($usec, $sec) = explode(" ", microtime());
	return ((float)$usec + (float)$sec);
}
?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>

<script language="Javascript">
function titleUpdate ($message)
	document.write($message);
}
</script>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Link Checker Initialising</title>
</head>

<style>
td.num {
	vertical-align: middle;
	text-align: center;
	font-weight: bold;
	font-family: "Lucida Sans", verdana;
	font-size: 18px;
}

td.content {
	text-align: left;
	font-family: verdana, arial;
	font-size: 13px;
	color: #fff;
}

</style>

<body>

<?

mysql_connect("localhost", "xxx", 'xxx') or die(mysql_error());
mysql_select_db("discwo_links") or die (mysql_error());
// firstly get the backlink
$query = "SELECT * FROM rl_config WHERE name ='backlink'";
$result = mysql_query($query) or die (mysql_error());
while ($array = mysql_fetch_array($result, MYSQL_ASSOC)) {
	$backLink = $array['value'];
}

$query = "SELECT * FROM rl_links WHERE id <= 30 ORDER BY id ASC";
$result = mysql_query($query) or die (mysql_error());
$totalRecords = mysql_num_rows($result);
?>
<table width="1000" border="0" cellspacing="3" cellpadding="3">
<?
while ($array = mysql_fetch_array($result))
{
	$start = getMicrotime ();
        // send empty data to the browser to instantiate output buffering
	echo '                                                                                                                                                                                                                                                         ';
	if (!isset($i)) {
		$i = 1;
	}
	ob_start ();
	$std = new getURLInformation($array['url'], $backLink);
	$rec = new getURLInformation($array['reciprocal'], $backLink);
	// get colors to display table background
	if ($std->validURL === TRUE) {
		$tdURLBackground = "green";
	} else {
		$tdURLBackground = "red";
	}
	if ($rec->validURL === TRUE) {
		$tdRecBackground = "green";
	} else {
		$tdRecBackground = "red";
	}
	if ($rec->hasBackLink === TRUE) {
		$tdBackLinkBackground = 'green';
		$tdBackLinkComment = 'Yes';
	} else {
		$tdBackLinkBackground = 'red';
		$tdBackLinkComment = 'No';
	}
	?>
	<tr>
    	<td class="num" width="74" rowspan="2"><?=$i;?></td>
    	<td class="content" width="926" style="background-color: <?=$tdURLBackground;?>"><strong>URL:</strong> <?=$array['url'];?></td>
    </tr>
    <tr>
    	<td class="content" style="background-color: <?=$tdRecBackground;?>"><strong>REC:</strong> <?=$array['reciprocal'];?></td>
    </tr>
    <tr>
    	<td>&nbsp;</td>
    	<td class="content" style="background-color: <?=$tdBackLinkBackground;?>"><strong>Has Backlink:</strong> <?=$tdBackLinkComment;?> (<?=$rec->backLink;?>)</td>
    </tr>
    <tr id="curr<?=$i;?>">
    <?
    $end = getMicrotime ();
	$timeTook = number_format(($end - $start), 2);
	?>
    	<td colspan="2">Took <?=$timeTook;?> Seconds</td>
    </tr>
	<?
	unset($std);
	unset($rec);
	// get current percent
	$percentComplete = getPercentComplete ($totalRecords, $i);
	?>
	<script>
		document.title='Link Checker : <?=$percentComplete;?>% Complete';
	</script>
	<?
	
	ob_end_flush();
	$i++;
}
	?>
</table>

<h2>Finished Analysis</h2>
</body>
</html>
the class wholes all the functions for checking what i need, the rest is just for nice'tys

Posted: Wed Apr 12, 2006 4:21 am
by s.dot
I would guess that as fast as you can go is as fast as the webserver you're visiting can generate the page. Some pages do take around a second to load (and some lots longer).

I'd say what you have is pretty good.

Sorry I couldn't be of useful help.

Posted: Wed Apr 12, 2006 4:23 am
by malcolmboston
thats no problem, like i said, ive ran every optimisation in the book and tried alternative functions, all are either slower or pretty much the same...

anyone else?

Posted: Wed Apr 12, 2006 6:48 am
by Buddha443556
Did you try cURL? I read it suppose to be faster than fopen in the Manual somewhere.

Posted: Wed Apr 12, 2006 11:28 am
by ed209
are you sure it's fopen that's taking the time? I think I read somewhere that file_get_contents was quicker? But don't quote me on it. Maybe the strpos function takes longer than some sort of regex?

Posted: Wed Apr 12, 2006 3:33 pm
by Sema
malcolmboston wrote:thats no problem, like i said, ive ran every optimisation in the book and tried alternative functions, all are either slower or pretty much the same...

anyone else?
I would say like ed209 that it’s the fopen (contacting the remote server) that is the problem here... It takes time to contact a remote server, and that’s that. It can’t be done faster.

I have done something like that, were I where forced to make a number of http requests to remote sites. The only “trick” I have found is making a number of fsocketopen calls to scripts (async call), so a number of scripts can make calls to the remote servers on the same time. The problem would then be to sync these scripts so they don’t call the same servers.

Posted: Wed Apr 12, 2006 4:29 pm
by timvw
Your code will probably check each site/page in a sequential way..

With php streams you can use stream_select and verify multiple pages at once (another optimisation would be to save the actual page where the link was found. And for following searches always try that page (or those pages) first.)