Page 1 of 1
What is the optimize way to search for a sentence in the 100
Posted: Mon Jan 30, 2006 5:40 am
by genux33
What is the optimize way to search for a sentence in the 100mb file with the lessest usage of CPU and memory?
I had an 100mb log file and i needed to find all the sentence that contain the word "Subject".
Posted: Mon Jan 30, 2006 5:44 am
by foobar
You're pretty much screwed if you want to use PHP.
Also, this is probably the wrong forum, and should be in PHP - Code or General Discussion.
Just use your favorite text editor. Chances are, it's gonna be a lot faster than PHP.
Posted: Mon Jan 30, 2006 6:06 am
by timvw
my favorite would be: 'grep "word" logfile'
Posted: Mon Jan 30, 2006 10:04 am
by Chris Corbyn
timvw wrote:my favorite would be: 'grep "word" logfile'
Definitely grep.... but finding "sentences" won't be a work of art. grep finds lines.... you'd need some fancy regex work to pull out the full sentence.
Re: What is the optimize way to search for a sentence in the
Posted: Tue Jan 31, 2006 6:44 pm
by Christopher
genux33 wrote:with the lessest usage of CPU and memory?.
Are you doing it once or repeatedly? If once then grep. If repeatedly then you should index the file using a text search system.
See
http://www.searchtools.com/tools/tools-opensource.html.
I know that Zend are porting Lucene to PHP for their forthcoming Zend Framework. I hear that a preview release will be out in February.
Posted: Mon Feb 06, 2006 8:41 am
by quocbao
If you must use PHP to search for a string in a 100mb file , what should you do ?
Posted: Mon Feb 06, 2006 9:41 am
by feyd
load it line-by-line, throwing away old lines.
Posted: Mon Feb 06, 2006 9:48 am
by hawleyjr
UltraEdit has a "Copy All Lines" that contain search criteria...
Posted: Mon Feb 06, 2006 7:45 pm
by quocbao
feyd wrote:load it line-by-line, throwing away old lines.
The same idea
Code: Select all
class FileSearcher
{
/**
* Seek position
*
* @var int
*/
var $seek = 0;
/**
* File pointer
*
* @var resource
*/
var $pointer = null;
/**
* Search string
*
* @var string
*/
var $search = "";
/**
* Buffer
*
* @var string
*/
var $buffer = "";
/**
* Found seek
*
* @var bool
*/
var $found = -1;
/**
* Read next buffer
*
* @return string
*/
function read()
{
$buffer = "";
if ($this->pointer && !feof($this->pointer))
{
fseek($this->pointer,$this->seek);
$readlength = 10240; //10 kb
//enough to contain search
if (($readlength + strlen($this->buffer)) < strlen($this->search))
{
$readlength = strlen($this->search) - strlen($this->buffer) + $readlength;
}
$buffer = fread($this->pointer,$readlength);
}
return $buffer;
}
/**
* Search for next string
*
* @return bool
*/
function search()
{
if (!$this->pointer) //where do you want me to find ?
{
return false;
}
if ($this->search == "") //what do you want me to find ?
{
return false;
}
$buffer =& $this->buffer; //back reference
$seek =& $this->seek;
if (strlen($buffer) < strlen($this->search)) //update buffer
{
$buffer = $this->read();
if (strlen($buffer) < strlen($this->search)) //no more to find
{
return false;
}
}
//search in buffer
$pos = strpos($buffer,$this->search);
if ($pos !== false) //got it
{
$seek += $pos + 1;
$buffer = substr($buffer,$pos + 1);
$this->found = $seek-1;
}
else //nope
{
$seek += strlen($buffer) - strlen($this->search) + 1;
$buffer = substr($buffer,-strlen($this->search) + 1);
$this->found = -1;
}
return true;
}
/**
* Open file for searching
*
* @return bool
*/
function open($filename)
{
$this->close();
//open file and reset seek
$this->pointer = fopen($filename,"rb");
$this->seek = 0;
$this->found = -1;
return ($this->pointer != null);
}
/**
* Close file and stop searching
*
*/
function close()
{
if ($this->pointer) fclose($this->pointer);
$this->pointer = null;
}
}
Here is an example :
Code: Select all
$f = new FileSearcher();
$time = microtime_float();
$f->open('diendan.sql'); //10mb
$f->search = "quocbao";
$total = 0;
while ($f->search())
{
if ($f->found != -1)
{
echo "Found : " . $f->found . "<BR>\n";
$total++;
}
flush();
}
echo "Total $total " . (microtime_float() - $time) . "<br>";
$f->close();
function microtime_float()
{
list($usec, $sec) = explode(" ", microtime());
return ((float)$usec + (float)$sec);
}
I have tested my class , it searched for a string in a 10mb just 0.5 sec ( but you can improve it , change the value $readlength )
Maybe you can also help me to test this class ^__^