speeding up fopen?

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
malcolmboston
DevNet Resident
Posts: 1826
Joined: Tue Nov 18, 2003 1:09 pm
Location: Middlesbrough, UK

speeding up fopen?

Post by malcolmboston »

as many of you will know im currently building a list that goes to a URL and checks if its valid or not, and then gets the source code and checks if a link back to my clients site exists in that source code.

Its all working quite nicely now, however ive done some timing functions and on average these 3 actions take between 0.9 - 1.2s

Now first of all i used classes, i thought i would try it just using only function but the difference in speed that is reported is pretty much non-existant.

Now, the problemi am having is that my client has a huge live database that im getting these URL's from and yesterday it took 3 hours just to complete 77% of the script, he aint happy and ive tried to explain that PHP is not exactly the best language to be doing this in but still....

He pointed me to a script here that does what mine exactly what mine does but produces 10 results a tad faster than mine, obviously using pagination is the key but hes trying to get away from that, now obvious mathematics quote that it would take my script roughly 10s to do the same

Ive went through every optimisation in the book and i dont seem to be getting it done any faster

Any ideas?

my source code is

Code: Select all

<?php
error_reporting(E_ALL);
set_time_limit(0);

// this class sometimes (rarely) verifies a URL as invalid when it isnt
// i would always manually check this

class getURLInformation
{
	var $URL;
	var $backLink;
	
	function getURLInformation ($URL, $backLink) {
		// check to see if the URL is a valid one
		$this->URL = $URL;
		$this->backLink = $backLink;
		$this->validURL = $this->checkValidURL();
		// do a conditional get source
		if ($this->validURL === TRUE) {
			// get the source code
			$this->checkBackLink ();
		} else {
			$this->hasBackLink = "NO";
		}
	}
	
	function checkValidURL () {
		$this->handle = @fopen($this->URL, "r");
		if (!$this->handle) {
			return FALSE;
		} else {
			$this->sourceCode = '';
			while (!feof($this->handle)) {
				$this->sourceCode .= fread($this->handle, 8192);
			}
			$this->sourceCode = $this->sourceCode;
		}
		return TRUE;
	}
	
	function checkBackLink () {
		if (strpos($this->sourceCode, $this->backLink)) {
			$this->hasBackLink = TRUE;
		} else {
			$this->hasBackLink = FALSE;
		}
	}
	
}

function getPercentComplete ($total, $current) {
	$percent = ($current / $total) * 100;
	$percent = number_format($percent, 1);
	return $percent;
}

function getMicrotime () {
	list($usec, $sec) = explode(" ", microtime());
	return ((float)$usec + (float)$sec);
}
?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>

<script language="Javascript">
function titleUpdate ($message)
	document.write($message);
}
</script>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Link Checker Initialising</title>
</head>

<style>
td.num {
	vertical-align: middle;
	text-align: center;
	font-weight: bold;
	font-family: "Lucida Sans", verdana;
	font-size: 18px;
}

td.content {
	text-align: left;
	font-family: verdana, arial;
	font-size: 13px;
	color: #fff;
}

</style>

<body>

<?

mysql_connect("localhost", "xxx", 'xxx') or die(mysql_error());
mysql_select_db("discwo_links") or die (mysql_error());
// firstly get the backlink
$query = "SELECT * FROM rl_config WHERE name ='backlink'";
$result = mysql_query($query) or die (mysql_error());
while ($array = mysql_fetch_array($result, MYSQL_ASSOC)) {
	$backLink = $array['value'];
}

$query = "SELECT * FROM rl_links WHERE id <= 30 ORDER BY id ASC";
$result = mysql_query($query) or die (mysql_error());
$totalRecords = mysql_num_rows($result);
?>
<table width="1000" border="0" cellspacing="3" cellpadding="3">
<?
while ($array = mysql_fetch_array($result))
{
	$start = getMicrotime ();
        // send empty data to the browser to instantiate output buffering
	echo '                                                                                                                                                                                                                                                         ';
	if (!isset($i)) {
		$i = 1;
	}
	ob_start ();
	$std = new getURLInformation($array['url'], $backLink);
	$rec = new getURLInformation($array['reciprocal'], $backLink);
	// get colors to display table background
	if ($std->validURL === TRUE) {
		$tdURLBackground = "green";
	} else {
		$tdURLBackground = "red";
	}
	if ($rec->validURL === TRUE) {
		$tdRecBackground = "green";
	} else {
		$tdRecBackground = "red";
	}
	if ($rec->hasBackLink === TRUE) {
		$tdBackLinkBackground = 'green';
		$tdBackLinkComment = 'Yes';
	} else {
		$tdBackLinkBackground = 'red';
		$tdBackLinkComment = 'No';
	}
	?>
	<tr>
    	<td class="num" width="74" rowspan="2"><?=$i;?></td>
    	<td class="content" width="926" style="background-color: <?=$tdURLBackground;?>"><strong>URL:</strong> <?=$array['url'];?></td>
    </tr>
    <tr>
    	<td class="content" style="background-color: <?=$tdRecBackground;?>"><strong>REC:</strong> <?=$array['reciprocal'];?></td>
    </tr>
    <tr>
    	<td>&nbsp;</td>
    	<td class="content" style="background-color: <?=$tdBackLinkBackground;?>"><strong>Has Backlink:</strong> <?=$tdBackLinkComment;?> (<?=$rec->backLink;?>)</td>
    </tr>
    <tr id="curr<?=$i;?>">
    <?
    $end = getMicrotime ();
	$timeTook = number_format(($end - $start), 2);
	?>
    	<td colspan="2">Took <?=$timeTook;?> Seconds</td>
    </tr>
	<?
	unset($std);
	unset($rec);
	// get current percent
	$percentComplete = getPercentComplete ($totalRecords, $i);
	?>
	<script>
		document.title='Link Checker : <?=$percentComplete;?>% Complete';
	</script>
	<?
	
	ob_end_flush();
	$i++;
}
	?>
</table>

<h2>Finished Analysis</h2>
</body>
</html>
the class wholes all the functions for checking what i need, the rest is just for nice'tys
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

I would guess that as fast as you can go is as fast as the webserver you're visiting can generate the page. Some pages do take around a second to load (and some lots longer).

I'd say what you have is pretty good.

Sorry I couldn't be of useful help.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
malcolmboston
DevNet Resident
Posts: 1826
Joined: Tue Nov 18, 2003 1:09 pm
Location: Middlesbrough, UK

Post by malcolmboston »

thats no problem, like i said, ive ran every optimisation in the book and tried alternative functions, all are either slower or pretty much the same...

anyone else?
User avatar
Buddha443556
Forum Regular
Posts: 873
Joined: Fri Mar 19, 2004 1:51 pm

Post by Buddha443556 »

Did you try cURL? I read it suppose to be faster than fopen in the Manual somewhere.
User avatar
ed209
Forum Contributor
Posts: 153
Joined: Thu May 12, 2005 5:06 am
Location: UK

Post by ed209 »

are you sure it's fopen that's taking the time? I think I read somewhere that file_get_contents was quicker? But don't quote me on it. Maybe the strpos function takes longer than some sort of regex?
User avatar
Sema
Forum Commoner
Posts: 34
Joined: Fri Sep 03, 2004 12:43 pm
Location: Aalborg, Denmark

Post by Sema »

malcolmboston wrote:thats no problem, like i said, ive ran every optimisation in the book and tried alternative functions, all are either slower or pretty much the same...

anyone else?
I would say like ed209 that it’s the fopen (contacting the remote server) that is the problem here... It takes time to contact a remote server, and that’s that. It can’t be done faster.

I have done something like that, were I where forced to make a number of http requests to remote sites. The only “trick” I have found is making a number of fsocketopen calls to scripts (async call), so a number of scripts can make calls to the remote servers on the same time. The problem would then be to sync these scripts so they don’t call the same servers.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

Your code will probably check each site/page in a sequential way..

With php streams you can use stream_select and verify multiple pages at once (another optimisation would be to save the actual page where the link was found. And for following searches always try that page (or those pages) first.)
Post Reply