What is the optimize way to search for a sentence in the 100

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
genux33
Forum Newbie
Posts: 18
Joined: Sun Apr 10, 2005 8:22 am

What is the optimize way to search for a sentence in the 100

Post by genux33 »

What is the optimize way to search for a sentence in the 100mb file with the lessest usage of CPU and memory?

I had an 100mb log file and i needed to find all the sentence that contain the word "Subject".
foobar
Forum Regular
Posts: 613
Joined: Wed Sep 28, 2005 10:08 am

Post by foobar »

You're pretty much screwed if you want to use PHP.
Also, this is probably the wrong forum, and should be in PHP - Code or General Discussion.

Just use your favorite text editor. Chances are, it's gonna be a lot faster than PHP.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

my favorite would be: 'grep "word" logfile'
User avatar
Chris Corbyn
Breakbeat Nuttzer
Posts: 13098
Joined: Wed Mar 24, 2004 7:57 am
Location: Melbourne, Australia

Post by Chris Corbyn »

timvw wrote:my favorite would be: 'grep "word" logfile'
Definitely grep.... but finding "sentences" won't be a work of art. grep finds lines.... you'd need some fancy regex work to pull out the full sentence.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: What is the optimize way to search for a sentence in the

Post by Christopher »

genux33 wrote:with the lessest usage of CPU and memory?.
Are you doing it once or repeatedly? If once then grep. If repeatedly then you should index the file using a text search system.

See http://www.searchtools.com/tools/tools-opensource.html.

I know that Zend are porting Lucene to PHP for their forthcoming Zend Framework. I hear that a preview release will be out in February.
(#10850)
User avatar
quocbao
Forum Commoner
Posts: 59
Joined: Sat Feb 04, 2006 2:03 am
Location: HCM,Vietnam
Contact:

Post by quocbao »

If you must use PHP to search for a string in a 100mb file , what should you do ?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

load it line-by-line, throwing away old lines.
User avatar
hawleyjr
BeerMod
Posts: 2170
Joined: Tue Jan 13, 2004 4:58 pm
Location: Jax FL & Spokane WA USA

Post by hawleyjr »

UltraEdit has a "Copy All Lines" that contain search criteria...
User avatar
quocbao
Forum Commoner
Posts: 59
Joined: Sat Feb 04, 2006 2:03 am
Location: HCM,Vietnam
Contact:

Post by quocbao »

feyd wrote:load it line-by-line, throwing away old lines.
The same idea :)

Code: Select all

class FileSearcher
{
	/**
	 * Seek position
	 *
	 * @var int
	 */
	var $seek = 0;
	
	/**
	 * File pointer
	 * 
	 * @var resource
	 */
	var $pointer = null;
	
	/**
	 * Search string
	 *
	 * @var string
	 */
	var $search = "";
	/**
	 * Buffer
	 * 
	 * @var string
	 */
	var $buffer = "";
	
	/**
	 * Found seek
	 * 
	 * @var bool
	 */
	var $found = -1;
	
	/**
	 * Read next buffer
	 *
	 * @return string
	 */
	function read()
	{
		$buffer = "";
		if ($this->pointer && !feof($this->pointer))
		{	
			fseek($this->pointer,$this->seek);
			
			$readlength = 10240; //10 kb
			
			//enough to contain search
			if (($readlength + strlen($this->buffer)) < strlen($this->search))
			{
				$readlength = strlen($this->search) - strlen($this->buffer) + $readlength;
			}
			
			$buffer = fread($this->pointer,$readlength);
		}
		return $buffer;
	}
	
	/**
	 * Search for next string
	 *
	 * @return bool
	 */
	function search()
	{
		if (!$this->pointer) //where do you want me to find ?
		{
			return false;
		}
		if ($this->search == "") //what do you want me to find ?
		{
			return false;
		}
		
		$buffer =& $this->buffer; //back reference
		$seek =& $this->seek;
		
		if (strlen($buffer) < strlen($this->search)) //update buffer
		{
			$buffer = $this->read();
			
			if (strlen($buffer) < strlen($this->search)) //no more to find 
			{
				return false;
			}
		}
		
		//search in buffer
		$pos = strpos($buffer,$this->search);
		
		if ($pos !== false) //got it 
		{
			$seek += $pos + 1;
			
			$buffer = substr($buffer,$pos + 1);
			$this->found = $seek-1;
		}
		else //nope 
		{
			$seek += strlen($buffer) - strlen($this->search) + 1;
			$buffer = substr($buffer,-strlen($this->search) + 1);
			$this->found = -1;
		}
		return true;
	}
	
	/**
	 * Open file for searching
	 *
	 * @return bool
	 */
	function open($filename)
	{
		$this->close();
		//open file and reset seek
		$this->pointer = fopen($filename,"rb");
		$this->seek = 0;
		$this->found = -1;
		return ($this->pointer != null);
	}
	
	/**
	 * Close file and stop searching
	 *
	 */
	function close()
	{
		if ($this->pointer) fclose($this->pointer);
		
		$this->pointer = null;
	}
	
}
Here is an example :

Code: Select all

$f = new FileSearcher();

$time = microtime_float();

$f->open('diendan.sql'); //10mb 
$f->search = "quocbao";

$total = 0;

while ($f->search())
{	
	if ($f->found != -1)
	{		
		echo "Found : " . $f->found . "<BR>\n";
		$total++;
	}
	flush();
}

echo "Total $total " . (microtime_float() - $time) . "<br>";

$f->close();

function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}
I have tested my class , it searched for a string in a 10mb just 0.5 sec ( but you can improve it , change the value $readlength )

Maybe you can also help me to test this class ^__^
Post Reply