Page 1 of 1

string speedup question

Posted: Tue Jan 15, 2008 4:21 am
by Kak
Hi everybody!

I have a problem which I have found a solution for, but I'd like to know why php is behaving this way.


I have a file format with this layout
{Key,dataLen}data{Key2,dataLen2}data2 ....
where dataLen can be any size but for this project it's always between 20 or 30 bytes.

and I did a function to parse it: You get the data from $processInfo , extract a {key,datalen}data info, and remove it from $processInfo string.

Code: Select all

 
function getKeyOldAndSlow ()
{
global $processInfo;
 
    if ($processInfo[0]!='{') return FALSE;
    $rv=array();
 
    $index1=strpos ($processInfo,',');
    $index2=strpos ($processInfo,'}');
 
    $rv['key']=substr ($processInfo,1,$index1-1);
    $rv['length']=substr ($processInfo,$index1+1,$index2-$index1-1);
    $rv['value']=substr ($processInfo,$index2+1,$rv['length']);
 
    $processInfo=substr ($processInfo,$index2+1+$rv['length']);
    return $rv;
}
 
 
The function was slow as hell for 2MB string, and I thought it was due to the last substr, it's creating an string everytime I need to access a key. So I rewrote the function so that I didn't have to recreate that string:

Code: Select all

 
function getKey ()
{
global $processInfo;
global $PIindex;
 
    if ($processInfo[$PIindex]!='{') return FALSE;
    $rv=array();
 
    $index1=strpos ($processInfo,',',$PIindex);
    $index2=strpos ($processInfo,'}',$PIindex);
 
    $rv['key']=substr ($processInfo,$PIindex+1,$index1-$PIindex-1);
    $len=$rv['length']=substr ($processInfo,$index1+1,$index2-$index1-1);
    $rv['value']=substr ($processInfo,$index2+1,$len);
 
 
    $PIindex=$index2+1+$len;
    return $rv;
}
 
 
But it's yet very very slow. My surprise comes when I reduce the amount of data of $processInfo to 1/4, then it's very fast (more than 10x in fact). So this is the solution, I've built a system that keeps $processInfo small and from time to time gets more data from the real $processInfo into that string. However I'd like to know why this is behaving this way, imho substr time should be linear, no matter how big is the string I'm using :?:

As a C programmer, I can't imagine why it's working this way :banghead: :banghead:

TIA
Kak

Re: string speedup question

Posted: Tue Jan 15, 2008 4:52 am
by VladSun
I would suggest you to use regexp instead of your own parsing code:

Code: Select all

 
$s = "{Key,dataLen}adata{Key2,dataLen2}bdata{Key3,dataLen3}cdata";
 
$items = preg_match_all('/(\{(\w*?),(\d*?)\}(\w*)?)/', $s, $matches);
 
print_r($matches);
 
Hope that it's faster, although it's more memory consuming.

PS: Also - don't use global - pass the variables by reference instead:

Code: Select all

 
function func(&$var)
 
PPS: I hate that preg_match_all would return the whole matched string in the first array - if I wanted to have it, I would place () surrounding the whole pattern :evil:

Re: string speedup question

Posted: Tue Jan 15, 2008 9:08 am
by Kieran Huggins
Sounds like the bastard son of JSON and bencoding

Of the two I'd recommend using JSON - and there's a built-in parser for it: json_(en|de)code()

Re: string speedup question

Posted: Wed Jan 16, 2008 4:02 am
by Kak
yes, but I was more interested in knowing why php is acting this way than having an alternative, I really can't understand why it's working that way.

I'll try the preg_ function though, thanks :)

Re: string speedup question

Posted: Wed Jan 16, 2008 4:05 am
by Kak
Kieran Huggins wrote:Sounds like the bastard son of JSON and bencoding

Of the two I'd recommend using JSON - and there's a built-in parser for it: json_(en|de)code()
wow, didn't know about JSON, it really looks very similar (I've been using this format for eons since I was coding for msdos to store data and I needed a flexible file format :) )

Re: string speedup question

Posted: Wed Jan 16, 2008 7:20 am
by VladSun
Kak wrote:yes, but I was more interested in knowing why php is acting this way than having an alternative, I really can't understand why it's working that way.

I'll try the preg_ function though, thanks :)
If you are really interested look at its source code in php-5.X.X/Zend/zend_operators.h

Code: Select all

static inline char *
zend_memnstr(char *haystack, char *needle, int needle_len, char *end)