Page 1 of 1

PHP Arrays and the Log(n) problem

Posted: Thu Apr 19, 2012 6:58 am
by malcolmboston
Hi Guys,

its been a while since i posted here, buy would like a fresh perspective on the problem im facing.

Now as many of us know PHP Array implementation if not as efficient as other languages which is why im thinking i may need to go a different route for this particular problem of my PHP App.

I have the following dataset

[text]Goal: To find the lowest value in the array[/text]

Code: Select all

// this is the sample value of the serialized array for each date (which the initial foreach iterates over
// these elements are also wrapped in an date array eg
// array (
// [2012-06-05] => array // array below
// [2012-06-04] => array // and again etc
Array
(
    [80s Casuals] => Array
        (
            [p] => 10
            [u] =>Johnny
        )

    [Alife] => Array
        (
            [p] => 48
            [u] => Adam
        )

    [Alife Clothing] => Array
        (
            [p] => 5
            [u] =>Aaron
        )

    [Alife Sweaters] => Array
        (
            [p] => 8
            [u] => Jamie
        )

    [Alife T Shirts] => Array
        (
            [p] => 6
            [u] => Alex
        )
// etc etc
Basically the code below iterates over 266 dates (growing by 1 daily) with an inner foreach loop of 127 array elements (sampled above), which very very rarely changes. So on a daily basis the array grows.

I loop over each of the 127 array elements checking for the lowest ['p'] valuee and then assign it to the best position, this is the only way i can see, in my mind, i can use to evaluate the best position. This is a design oversight on my part from the projects inception as i hold the values in a serialized array so its incredibly limiting as to what i can do with this in terms of pure SQL queries

Code: Select all

// my current parsing code
// built on codeigniter but this is irrelevant
$start = microtime (true);
$member_id = 12;
$keyword = 'Lacoste';
$best_position = 101;
$this->db->order_by('date', 'asc');
$query = $this->db->get_where('rankings', array('member_id' => $member_id));
if ($query->num_rows() > 0) {
	$s = microtime (true);
	foreach ($query->result() as $row) {
		$rankings[$row->date] = $row->rankings;
		
	}
	foreach ($rankings as $date => $serialized) {
		$rank_array = unserialize($serialized);
		if (isset($rank_array[$keyword]['p']) && $rank_array[$keyword]['p'] != '0') {
			// if looped position is less than best position
			if ($rank_array[$keyword]['p'] <= $best_position) {
				$array = array(
					'date' => $date,
					'position' => $rank_array[$keyword]['p']
				);
				$best_position = $rank_array[$keyword]['p'];
			}
			// break;
		}
		unset($rank_array);
	}
	if (isset($array)) {
		printr ($array);
		echo 'Took '.round((microtime(true) - $start), 5).'s<br />';
		echo 'Expected '.round(((microtime(true) - $start) * 137), 5).'s';
		exit;
	} else {
		echo 'array not set';
		exit;
	}
} else {
	echo 'no rankings';
	exit;
}
Using some benchmarks here are my current execution times on such a dataset

[text]
For entire date > element loop: 27.1s
Per 127 inner elements: 0.19s
[/text]

at the moment i do this in real time and obvfiously performance on this particular dataset produces a hige delay in loading the page, i need this to be much much faster,0.2s seems awfully slow for some simple comparisons, theres the min / max functions that i may be able to do something with but i dont think they will help increase performance dramatically here, especially not in the long term, i think PHP array implementation make this not a very good language for operating on the way ive stored the data.

the way i see it ive got a few options

[*]Rewrite this function using some unknown super optimised standard PHP code
[*]Rewrite this as a PHP module which, apart from having to learn some simplish c#(?) would be incredibly fast
[*]Write a seperate routine (scheduled by cron) that gets me this data and stores it as an INT in the database for easy and fast polling

As mentioned i believe its mainly caused by the fact i serialize the data so i cant do any fancy SQL queries to stop my having to loop, this is a design mistake in hindsight but something im stuck with for the time being as changing it is a very big task.

Im basically just looking for some thoughts from fellow developers on what they have / would do in situations like this.

Kind regards, Mal

Re: PHP Arrays and the Log(n) problem

Posted: Thu Apr 19, 2012 9:31 am
by x_mutatis_mutandis_x
Unfortunately for you, you have serialized the array already, so you are forced to unserialize it before even you attempt to sort. So if you try it in PHP/JAVA/C/perl or any other language (basically in-memory) you need to construct it from it's serialized state.

You can use a cache/session, instead of processing everytime:

if $minPs not in session/cache or user requested re-calculate min p {

foreach ($ranking as $date => $serialized) {

$unserialized = unserlize($serialized);
$min = end($unserialized);

foreach ($unserialized as $value) {
if ($value < $min) {
$min = $value;
}
}

$minPs[$date] = (int) $min;

}

store minPs in cache/session

}


$minP = min($minPs);

Re: PHP Arrays and the Log(n) problem

Posted: Thu Apr 19, 2012 9:43 am
by malcolmboston
In the last few hours this is exactly what i have done.

basically built a secondary scheduled task that runs that does this analysis, stores it in a super simple table for fast retrieval, had a quick look into the extension route and i think the time it would take me to learn how to do that would be too costly, i would just imagine C would of been able to iterate over that array 'infinitely' faster than PHP, like i said 0.2s on itterating a 127 item array with purely a comparison seems slow as hell. This way is messy but as we have both said, due to serialisation there is little option, the amount of data i hold i MUST serialise anyway, ill just have to do this secondary method from now on.

Re: PHP Arrays and the Log(n) problem

Posted: Thu Apr 19, 2012 10:00 am
by x_mutatis_mutandis_x
malcolmboston wrote:In the last few hours this is exactly what i have done.

basically built a secondary scheduled task that runs that does this analysis, stores it in a super simple table for fast retrieval, had a quick look into the extension route and i think the time it would take me to learn how to do that would be too costly, i would just imagine C would of been able to iterate over that array 'infinitely' faster than PHP, like i said 0.2s on itterating a 127 item array with purely a comparison seems slow as hell. This way is messy but as we have both said, due to serialisation there is little option, the amount of data i hold i MUST serialise anyway, ill just have to do this secondary method from now on.
Sorry I'm kinda new to using forums, edited my post instead of a new reply. My bad..

But you are right, serialization is a killer in your case. You can never serialize/encrypt something if you plan on indexing any of it's contents in the future. I would recommend looking into the insertion process if you have access to it, and calculate min value for the date at the time of insertion and store it in another column, so you can quickly sort using SQL. If not, you can store in the session and ask user if he is ok with waiting if he wants to refresh it.

Re: PHP Arrays and the Log(n) problem

Posted: Thu Apr 19, 2012 10:29 am
by malcolmboston
absolutely mate, this is exactly what i am planning on doing, unfortunately this particular section is the core part of a SaaS business i own and therefore i cannot just apply the change straight away without extensive testing, so this temporary fallback method is in place until i get to refactor that entire procedure in light of this new issue, which only arose as time went on ( hence log(n) )etc, so ive done this for the time being which has basically completely negated the 30s load time as i no longer use that method in real time, so mission accomplished.... for now.