Page 1 of 1

Designing code to handle a big array

Posted: Wed Jan 28, 2004 11:56 am
by ilovetoast
I need to be able to handle 2d arrays (matrices) like this:

Code: Select all

$matrix = array (array (1, 2, 3),
                 array (4, 5, 6),
                 array (7, 8, 9));
The numbers above are just random, but they will always be floats or ints in the actual program. The example is a 3x3 matrix. I need to be able to handle up to 400x400 or so. The matrices will always be square rows=cols.

I was testing some code for speed and came across an interesting behavior that I am unfamiliar with. Here is the code to make the matrix:

Code: Select all

// make a big matrix to test
$matrix = array_fill(0, 400, array_fill(0, 400, 1));
This works fine. If I do a multi-dimensional array print function, it correctly prints out a 400x400 matrix with every element having a value of 1.

Problem is...

I need to be able to transpose the matrix. ie...

Code: Select all

$matrix = array (array (1, 2, 3),
                 array (4, 5, 6),
                 array (7, 8, 9));
Needs would become...

Code: Select all

$matrix = array (array (1, 4, 7),
                 array (2, 5, 8),
                 array (3, 6, 9));
So I wrote this code:

Code: Select all

$sizeof_matrix = sizeof($matrix);
$transposed_matrix = array();
for ($i = 0; $i < $sizeof_matrix; $i++) &#123;
	for ($j = 0; $j < $sizeof_matrix; $j++) &#123;
		$transposed_matrix&#1111;$j]&#1111;$i] = $matrix&#1111;$i]&#1111;$j];
	&#125;
&#125;
This runs without errors and to my inspection has no logical flaws.

But... when I went to use it the script would just die. No errors, just a browser message that it couldn't get any data for the page. I traced the problem back to this transposition.

If I run the multi-dimensional array print function on $transposed_matrix, I get strange results. I get a 347x400 array, where the 347th row has no elements. I tried it on a different server, and got similar but different results. I got a 342x400 array, where the 342nd row had 258 elements.

I'm a bit confused as to what's happening. Any thoughts on what's going on here?

peace

mystery toast

Posted: Wed Jan 28, 2004 11:58 am
by ilovetoast
Oh, and this whole thing works perfectly for every x by x matrix where x is: 25, 50, 75, 125, 128, 256, 300.

Posted: Wed Jan 28, 2004 12:02 pm
by Weirdan
Check if the script runs out of memory. There is default limit of 8MB per script instance, if memory serves.

Posted: Wed Jan 28, 2004 12:38 pm
by BDKR
I don't have that much time to mess with it right now, but I do have a question or two.

1) Why not run this from the command line?

2) Are you using Windows? If you are using Linux, then it should be pretty easy to track a process and how much memory it's taking.

3) Are you setting the time limit? Try this ->

Code: Select all

set_time_limit(0);
'0' meaning that there isn't a time limit.

Another thing you may want to try is inserting a line like

Code: Select all

sleep(30);


after the generation of the first matrice. In that 30 second period of time
you could hack in a command or two to get a view of how much memory
the process is taking up. The program 'top' in linux will work perfectly for that. The Windows Task Manager in 2K and XP will tell you also.

What are you doing that you need a matrice (or matrices) of this size and also need to run through a browser?

Cheers,
BDKR

Posted: Wed Jan 28, 2004 12:42 pm
by BDKR
Weirdan wrote:Check if the script runs out of memory. There is default limit of 8MB per script instance, if memory serves.
I'm not sure, but might this not apply to CLI? I have a script for parsing the output of binlog files from MySQL 3.23.xx that I have seen grow to over 95 megabytes during exectuion. 99% of this was from a huge array that was the result of using the file() command on an exceptionally large text file.

Cheers,
BDKR

Posted: Wed Jan 28, 2004 2:23 pm
by ilovetoast
Some more info:

I'm testing on my Mac laptop and on a Linux server. Both running same apache and php setup.

from apache's error.log registers 2 lines when I hit the script:

Code: Select all

*** malloc&#1111;598]: error for object 0x2f05f0: Incorrect checksum for freed object - object was probably modified after being freed; break at szone_error
&#1111;Wed Jan 28 13:30:53 2004] &#1111;notice] child pid 598 exit signal Bus error (10)
nothing in the linux boxes error.log although it crashes out too at row 326 now.

mac httpd.crash.log:

Code: Select all

**********

Date/Time:  2004-01-28 13:46:09 -0600
OS Version: 10.2.8 (Build 6R73)
Host:       XXXXXXXXXXXX

Command:    httpd
PID:        660

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_PROTECTION_FAILURE (0x0002) at 0x00000009

Thread 0 Crashed:
 #0   0x90004204 in free_list_remove_ptr
 #1   0x90003ed4 in szone_free
 #2   0x02687ec8 in _efree
 #3   0x02694984 in safe_free_zval_ptr
 #4   0x02692d90 in _zval_ptr_dtor
 #5   0x026a9d3c in zend_hash_destroy
 #6   0x026a0170 in _zval_dtor
 #7   0x02692d84 in _zval_ptr_dtor
 #8   0x026a9d3c in zend_hash_destroy
 #9   0x02692848 in shutdown_executor
 #10  0x026a1adc in zend_deactivate
 #11  0x02657d6c in php_request_shutdown
 #12  0x026be718 in apache_php_module_main
 #13  0x026bf9f4 in send_php
 #14  0x026bfa74 in send_parsed_php
 #15  0x0000d0c8 in ap_invoke_handler
 #16  0x00016e04 in process_request_internal
 #17  0x00016e94 in ap_process_request
 #18  0x00006688 in child_main
 #19  0x000068fc in make_child
 #20  0x00006c2c in perform_idle_server_maintenance
 #21  0x00007190 in standalone_main
 #22  0x00007828 in main
 #23  0x000026e0 in _start
 #24  0x00002560 in start

PPC Thread State:
  srr0: 0x90004204 srr1: 0x0000f030                vrsave: 0x00000000
   xer: 0x00000000   lr: 0x900041d0  ctr: 0x90000ee0   mq: 0x00000000
    r0: 0x00156e9a   r1: 0xbfffbef0   r2: 0x0408040c   r3: 0x00000000
    r4: 0x00000000   r5: 0x00000001   r6: 0x80808080   r7: 0x00000002
    r8: 0x6f720000   r9: 0x00000000  r10: 0x00f52010  r11: 0xa00047b0
   r12: 0x90000ee0  r13: 0x00000000  r14: 0x00000000  r15: 0x00000000
   r16: 0x00000000  r17: 0x00000000  r18: 0x00000000  r19: 0x00000000
   r20: 0x00000000  r21: 0x00000000  r22: 0xa0003d10  r23: 0x00000030
   r24: 0x00004003  r25: 0x00064010  r26: 0x00000003  r27: 0x00000002
   r28: 0x002ee700  r29: 0x00000001  r30: 0x0022adb0  r31: 0x9000416c
I watched the process with top on Linux and ProcessViewer on the Mac. Memory usage goes up, but not much. % does indicate that it is hitting 15Mb or so on both machines. The CPU usage however, goes to 90%+ (give or take with 1 sec monitoring) on both.

Answers to BDKR questions:

1) That's where this may go, but I thought PHP could handle the transposition as it only gets run a handful (6-12 to be exact) of times per day on tables that large. I didn't expect an error, so I just figured I'd give it a try.

2) No Windows. See Linux/Mac memory notes above.

3) I'll try that and watch the memory under that setup.

I can get around this in the place where I put the data into the matrix. I can just build a second one at that point in the transposed form. But I was trying to be too smart for my own good by doing the transposition the way I did.

What is this for? I'm using the code from the Bradley-Terry Model project I just finished (the math degree thread). I want to use the model to rate NCAA basketball for the tournament in March (and next fall for NCAA football) in order to help feed my gambling habit. There are 300 or so colleges with teams, so that's how the matrix got so big.

Everything else works perfect in PHP, except that silly matrix transposition. So, that's why I was trying to solve it rather than just go to the CL. Now, I'm just curious why the thing breaks.

peace

Posted: Wed Jan 28, 2004 2:33 pm
by ilovetoast
set_time_limit(0); had no effect.

sleep(30) gave me longer to watch the process viewer and top, but the numbers just rose to 80-95% CPU on both and the equivalent of 6-10Mb of Memory.

If I make a second matrix alongside the first (in transposed form), are no problems. Script runs fine at all points, including several segments that loop through every row and column in the matrix.

Something about this assignment:

$transposed_matrix[$j][$i] = $matrix[$i][$j];

in the middle of the loop, makes it go haywire...

Posted: Wed Jan 28, 2004 8:18 pm
by timvw
If you have n x n matrices you can also make an array of length n * n. And then access each element as row * n + column. I think this way you can avoid some offset calculations that have to be done otherwise by php.

Posted: Wed Jan 28, 2004 8:52 pm
by ilovetoast
Interesting approach timvw I'll give that a try and see what happens.

peace

Posted: Wed Jan 28, 2004 11:25 pm
by ilovetoast
Ding ding ding. We have a winner.

Able to solidly handle a 160,000 element 1d array. No memeory meltdowns.

Thanks so much timvw. Back to the code, now I have to adjust the Gauss Seidel Method algorithm to handle a 1d array instead of 2d matrix, can do.

peace

sugar on toast

Posted: Sat Jan 31, 2004 12:36 pm
by Weirdan
Oh, seems that I need to look into the PHP code to see how it handles multidimensional arrays. Thanks, ilovetoast, for bringing this issue to my attention.

PS: I had to use MD arrays a lot last time...

Posted: Tue Feb 03, 2004 7:39 pm
by Stoker
not sure if it is relevant, but in the past, using PHP 4.1.2 (Debian/Woody), I have a couple of times experienced oddities with multidim arrays when assigning values directly to a part of an array that was unassigned...

this would be fine, of course:
$a = array(); $a['idx'] = 'Sierra';

while stuff similar to this is what I had a incident with or two and try to avoid, it may have been more than 2 levels..:
$a = array(); $a['a']['b'] = 'Nevada';

the way I avoid it is add another step or two, assigning arrays explicit:
$a = array(); $a['a] = array(); $a['a']['b'] = 'Pale Ale';

not sure if it is a true problem or just something I experienced and missunderstood, nor if it is applicable to this situation..

In this case I would have tried to prefill the target array with zeroes or something like that,or if the nesting would be dynamic, do a test for isset() or is_array() on mother..

Posted: Tue Feb 03, 2004 7:58 pm
by ilovetoast
I haven't tried it with the values unassigned, instaead I hard-coded several sample 160,000 element arrays for the tests.

Unfortunately, the array functions appear to have some problems inside PHP itself. That's the bad news.

Worse news is that PHP just isn't designed to handle scientific level math. It can handle small/very small data sets. For example, it can barely handle 75-100 pairwise comparisons. It can easily handle very small data sets of <50 elements comparison.

What that means is that it could be used to do simple statistical analysis but not used to extensive multivariate statistical analysis. Unfortunately for me, and any others interested, we have to to look elsewhere.

As an aside I supose I could contribute some new array module to PHP along the lines of a numpy port. But, as a time issue that isn't going to happen.

So, instead, I wrote a tiny litte Python app using the numpy extensions and have Python stick that info into a db. PHP is able to call that app and the results are excellent. For those interested, Python handled 100-150 iterations (with a summing calculation each iteration) through single and multidimensional arrays of 200-250,000 elements in about 25-30 seconds on average. All is well that ends well.

peace

toast and milk