Page 1 of 1

What is the optimize way to search for a sentence in the 100

Posted: Mon Jan 30, 2006 5:40 am
by genux33
What is the optimize way to search for a sentence in the 100mb file with the lessest usage of CPU and memory?

I had an 100mb log file and i needed to find all the sentence that contain the word "Subject".

Posted: Mon Jan 30, 2006 5:44 am
by foobar
You're pretty much screwed if you want to use PHP.
Also, this is probably the wrong forum, and should be in PHP - Code or General Discussion.

Just use your favorite text editor. Chances are, it's gonna be a lot faster than PHP.

Posted: Mon Jan 30, 2006 6:06 am
by timvw
my favorite would be: 'grep "word" logfile'

Posted: Mon Jan 30, 2006 10:04 am
by Chris Corbyn
timvw wrote:my favorite would be: 'grep "word" logfile'
Definitely grep.... but finding "sentences" won't be a work of art. grep finds lines.... you'd need some fancy regex work to pull out the full sentence.

Re: What is the optimize way to search for a sentence in the

Posted: Tue Jan 31, 2006 6:44 pm
by Christopher
genux33 wrote:with the lessest usage of CPU and memory?.
Are you doing it once or repeatedly? If once then grep. If repeatedly then you should index the file using a text search system.

See http://www.searchtools.com/tools/tools-opensource.html.

I know that Zend are porting Lucene to PHP for their forthcoming Zend Framework. I hear that a preview release will be out in February.

Posted: Mon Feb 06, 2006 8:41 am
by quocbao
If you must use PHP to search for a string in a 100mb file , what should you do ?

Posted: Mon Feb 06, 2006 9:41 am
by feyd
load it line-by-line, throwing away old lines.

Posted: Mon Feb 06, 2006 9:48 am
by hawleyjr
UltraEdit has a "Copy All Lines" that contain search criteria...

Posted: Mon Feb 06, 2006 7:45 pm
by quocbao
feyd wrote:load it line-by-line, throwing away old lines.
The same idea :)

Code: Select all

class FileSearcher
{
	/**
	 * Seek position
	 *
	 * @var int
	 */
	var $seek = 0;
	
	/**
	 * File pointer
	 * 
	 * @var resource
	 */
	var $pointer = null;
	
	/**
	 * Search string
	 *
	 * @var string
	 */
	var $search = "";
	/**
	 * Buffer
	 * 
	 * @var string
	 */
	var $buffer = "";
	
	/**
	 * Found seek
	 * 
	 * @var bool
	 */
	var $found = -1;
	
	/**
	 * Read next buffer
	 *
	 * @return string
	 */
	function read()
	{
		$buffer = "";
		if ($this->pointer && !feof($this->pointer))
		{	
			fseek($this->pointer,$this->seek);
			
			$readlength = 10240; //10 kb
			
			//enough to contain search
			if (($readlength + strlen($this->buffer)) < strlen($this->search))
			{
				$readlength = strlen($this->search) - strlen($this->buffer) + $readlength;
			}
			
			$buffer = fread($this->pointer,$readlength);
		}
		return $buffer;
	}
	
	/**
	 * Search for next string
	 *
	 * @return bool
	 */
	function search()
	{
		if (!$this->pointer) //where do you want me to find ?
		{
			return false;
		}
		if ($this->search == "") //what do you want me to find ?
		{
			return false;
		}
		
		$buffer =& $this->buffer; //back reference
		$seek =& $this->seek;
		
		if (strlen($buffer) < strlen($this->search)) //update buffer
		{
			$buffer = $this->read();
			
			if (strlen($buffer) < strlen($this->search)) //no more to find 
			{
				return false;
			}
		}
		
		//search in buffer
		$pos = strpos($buffer,$this->search);
		
		if ($pos !== false) //got it 
		{
			$seek += $pos + 1;
			
			$buffer = substr($buffer,$pos + 1);
			$this->found = $seek-1;
		}
		else //nope 
		{
			$seek += strlen($buffer) - strlen($this->search) + 1;
			$buffer = substr($buffer,-strlen($this->search) + 1);
			$this->found = -1;
		}
		return true;
	}
	
	/**
	 * Open file for searching
	 *
	 * @return bool
	 */
	function open($filename)
	{
		$this->close();
		//open file and reset seek
		$this->pointer = fopen($filename,"rb");
		$this->seek = 0;
		$this->found = -1;
		return ($this->pointer != null);
	}
	
	/**
	 * Close file and stop searching
	 *
	 */
	function close()
	{
		if ($this->pointer) fclose($this->pointer);
		
		$this->pointer = null;
	}
	
}
Here is an example :

Code: Select all

$f = new FileSearcher();

$time = microtime_float();

$f->open('diendan.sql'); //10mb 
$f->search = "quocbao";

$total = 0;

while ($f->search())
{	
	if ($f->found != -1)
	{		
		echo "Found : " . $f->found . "<BR>\n";
		$total++;
	}
	flush();
}

echo "Total $total " . (microtime_float() - $time) . "<br>";

$f->close();

function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}
I have tested my class , it searched for a string in a 10mb just 0.5 sec ( but you can improve it , change the value $readlength )

Maybe you can also help me to test this class ^__^