What is the optimize way to search for a sentence in the 100
Moderator: General Moderators
What is the optimize way to search for a sentence in the 100
What is the optimize way to search for a sentence in the 100mb file with the lessest usage of CPU and memory?
I had an 100mb log file and i needed to find all the sentence that contain the word "Subject".
I had an 100mb log file and i needed to find all the sentence that contain the word "Subject".
- Chris Corbyn
- Breakbeat Nuttzer
- Posts: 13098
- Joined: Wed Mar 24, 2004 7:57 am
- Location: Melbourne, Australia
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: What is the optimize way to search for a sentence in the
Are you doing it once or repeatedly? If once then grep. If repeatedly then you should index the file using a text search system.genux33 wrote:with the lessest usage of CPU and memory?.
See http://www.searchtools.com/tools/tools-opensource.html.
I know that Zend are porting Lucene to PHP for their forthcoming Zend Framework. I hear that a preview release will be out in February.
(#10850)
The same ideafeyd wrote:load it line-by-line, throwing away old lines.
Code: Select all
class FileSearcher
{
/**
* Seek position
*
* @var int
*/
var $seek = 0;
/**
* File pointer
*
* @var resource
*/
var $pointer = null;
/**
* Search string
*
* @var string
*/
var $search = "";
/**
* Buffer
*
* @var string
*/
var $buffer = "";
/**
* Found seek
*
* @var bool
*/
var $found = -1;
/**
* Read next buffer
*
* @return string
*/
function read()
{
$buffer = "";
if ($this->pointer && !feof($this->pointer))
{
fseek($this->pointer,$this->seek);
$readlength = 10240; //10 kb
//enough to contain search
if (($readlength + strlen($this->buffer)) < strlen($this->search))
{
$readlength = strlen($this->search) - strlen($this->buffer) + $readlength;
}
$buffer = fread($this->pointer,$readlength);
}
return $buffer;
}
/**
* Search for next string
*
* @return bool
*/
function search()
{
if (!$this->pointer) //where do you want me to find ?
{
return false;
}
if ($this->search == "") //what do you want me to find ?
{
return false;
}
$buffer =& $this->buffer; //back reference
$seek =& $this->seek;
if (strlen($buffer) < strlen($this->search)) //update buffer
{
$buffer = $this->read();
if (strlen($buffer) < strlen($this->search)) //no more to find
{
return false;
}
}
//search in buffer
$pos = strpos($buffer,$this->search);
if ($pos !== false) //got it
{
$seek += $pos + 1;
$buffer = substr($buffer,$pos + 1);
$this->found = $seek-1;
}
else //nope
{
$seek += strlen($buffer) - strlen($this->search) + 1;
$buffer = substr($buffer,-strlen($this->search) + 1);
$this->found = -1;
}
return true;
}
/**
* Open file for searching
*
* @return bool
*/
function open($filename)
{
$this->close();
//open file and reset seek
$this->pointer = fopen($filename,"rb");
$this->seek = 0;
$this->found = -1;
return ($this->pointer != null);
}
/**
* Close file and stop searching
*
*/
function close()
{
if ($this->pointer) fclose($this->pointer);
$this->pointer = null;
}
}Code: Select all
$f = new FileSearcher();
$time = microtime_float();
$f->open('diendan.sql'); //10mb
$f->search = "quocbao";
$total = 0;
while ($f->search())
{
if ($f->found != -1)
{
echo "Found : " . $f->found . "<BR>\n";
$total++;
}
flush();
}
echo "Total $total " . (microtime_float() - $time) . "<br>";
$f->close();
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}Maybe you can also help me to test this class ^__^