Page 1 of 2

Executing huge queries(60,000rows +)

Posted: Mon Aug 03, 2009 2:20 pm
by drewh
Hey guys I have a problem. I'm porting a j2ee web application for my company to php. I have several parts of the script where I have to query over 60,000 rows. Is there anyway I can speed up this query. It must be the way I am coding it but it is really slow(obviously) while the java application is a little faster. The way I have to do is really tricky. Each query builds upon it self. So it goes from Days, to hour summary to per hour summary. Here is an example of how it works

Code: Select all

function DisplayReport($currentLine, $currentProcess, $dateStart, $dateEnd) {
    
    $dsn = 'mysql://'.MYSQL_USERNAME.''.MYSQL_PASSWORD.'@localhost/***';
    $mdb2 =& MDB2::factory($dsn);
    
    if (PEAR::isError($mdb2)) {
        echo ($mdb2->getMessage().' - '.$mdb2->getUserinfo());
    }
    logToFile("info.log", "Start Day Summary");
    
    $daySummary = DaySummaryPallet($currentLine, 0, $dateStart, $dateEnd);
    $numDays = count($daySummary);
    
    for ($counterDay = 0; $counterDay < $numDays; $counterDay ++) {
        
        echo "<table border = '1'>";
        echo "<tr><td>";
        $dateHour = $daySummary[$counterDay]['year']."-".$daySummary[$counterDay]['month']."-".$daySummary[$counterDay]['day']; //start date
        echo "Date: ".$dateHour."<br/>";
        echo "---------------";
        echo "<table><tr><td>";
        
        logToFile("info.log", "Start Hour Summary");
        $hourSummary = HourSummaryPallet($currentLine, 0, $dateHour, $dateHour);
        $numHours = count($hourSummary);
        
        for ($counterHour = 0; $counterHour < $numHours; $counterHour++) {
            
            echo "Hour: ".$hourSummary[$counterHour]['hour']."<br/>";
            //echo "Average ounces ".$hourSummary[$counterHour]['avg_ounces'];
            
            $datePerHour = $dateHour." ".$hourSummary[$counterHour]['hour'];
            
            logToFile("info.log", "Start Per Hour Summary");
            
            echo "<table border = '1'>";
            echo "<tr><td>Time</td><td>Ounces</td><td>Revolutions</td><td>Pallet Count</td>
              <td>Roll Change</td><td>Breaks</td></tr>";
            echo "</td></tr></table>";
            $perHourSummary = PerHourSummaryPallet($currentLine, 0, $datePerHour, $datePerHour);
            $numPerHours = count($perHourSummary);
            logToFile("info.log", "Number of hours:" .$numPerHours);
            for ($counterPerHour = 0; $counterPerHour < $numPerHours; $counterPerHour ++) {
                
                echo $perHourSummary[$counterPerHour]['ounces'];
                echo "<br/>";
            }
            
        }
        echo "</table>";
        echo "</td></tr></table>";
    }
    $mdb2->disconnect();
}
 
function DaySummaryPallet( $currentLine, $currentProcess, $dateStart, $dateEnd) {
    $dsn = 'mysql://'.MYSQL_USERNAME.''.MYSQL_PASSWORD.'@localhost/***';
    $mdb2 =& MDB2::factory($dsn);
    
    if (PEAR::isError($mdb2)) {
        echo ($mdb2->getMessage().' - '.$mdb2->getUserinfo());
    }
    $query = ("SELECT year(time_stamp) year, month(time_stamp) month, dayofmonth(time_stamp) day,
            truncate(avg(ounces),2) avg_ounces, truncate(sum(ounces)/16.0,2) lbs, count(pkg_load_id) total_pallets
            FROM pkg_load WHERE pkg_line_id = '$currentLine[$currentProcess]' AND time_stamp
            BETWEEN '".$dateStart." 00:00:00' AND '".$dateEnd." 23:59:59'
            GROUP BY year(time_stamp), dayofyear(time_stamp)
            ORDER BY time_stamp DESC;");
    
    $result = $mdb2->query($query);
    logToFile("info.log", $query);   
    
    if (PEAR::isError($result)) {
        echo ($result->getMessage().' - '.$result->getUserinfo());
        exit();
    }
    
    $resultArray = $result->fetchAll(MDB2_FETCHMODE_ASSOC);
    
    $result->free();
    $mdb2->disconnect();
    return $resultArray;
}
 
 
 
What I'm asking if there are any suggestions, tricks to make this faster. Thanks guys.

Re: Executing huge queries(60,000rows +)

Posted: Mon Aug 03, 2009 2:49 pm
by VladSun
60'000+ rows is not *huge* database ;) I think 1'000'000 is considered a "big" database, still not "huge".

Back to your problem. Could you, please, extract the SQL query (only SQL, no PHP) that is executed slow and paste its SQL code. Also, pleas, post the database schema you are using.

PS: Please, use [ sql ] and [ /sql ] BBCode tags (without spaces in them) to hilight your SQL code.

Re: Executing huge queries(60,000rows +)

Posted: Mon Aug 03, 2009 3:12 pm
by drewh
Hey. Yea the database is way bigger than 60,000. That's just one query I have to execute and display on a page. I am using Innodb

[sql]Start Day SummarySELECT year(time_stamp) year, month(time_stamp) month, dayofmonth(time_stamp) day,            TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs, count(pkg_load_id) total_pallets            FROM pkg_load WHERE pkg_line_id = '3075' AND time_stamp            BETWEEN '2009-06-03 00:00:00' AND '2009-08-03 23:59:59'            GROUP BY year(time_stamp), dayofyear(time_stamp)            ORDER BY time_stamp DESC;Start Hour SummarySELECT hour(time_stamp) hour, TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs,            count(pkg_load_id) total_pallets FROM pkg_load WHERE pkg_line_id = '3075'            AND time_stamp BETWEEN '2009-6-24 00:00:00' AND '2009-6-24 23:59:59' GROUP BY hour(time_stamp)            ORDER BY time_stamp DESC;Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3075'                AND time_stamp BETWEEN '2009-6-24 10:00:00' AND '2009-6-24 10:59:59' ORDER BY time_stamp;Number of hours:4Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3075'                AND time_stamp BETWEEN '2009-6-24 6:00:00' AND '2009-6-24 6:59:59' ORDER BY time_stamp;[/sql]

This sortof shows how it works it hangs on that one usually and only displays 1 day.

[sql]Start Day SummarySELECT year(time_stamp) year, month(time_stamp) month, dayofmonth(time_stamp) day,            TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs, count(pkg_load_id) total_pallets            FROM pkg_load WHERE pkg_line_id = '16859' AND time_stamp            BETWEEN '2009-06-03 00:00:00' AND '2009-08-03 23:59:59'            GROUP BY year(time_stamp), dayofyear(time_stamp)            ORDER BY time_stamp DESC;Start Hour SummarySELECT hour(time_stamp) hour, TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs,            count(pkg_load_id) total_pallets FROM pkg_load WHERE pkg_line_id = '16859'            AND time_stamp BETWEEN '2009-6-24 00:00:00' AND '2009-6-24 23:59:59' GROUP BY hour(time_stamp)            ORDER BY time_stamp DESC;Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '16859'                AND time_stamp BETWEEN '2009-6-24 7:00:00' AND '2009-6-24 7:59:59' ORDER BY time_stamp;Number of hours:7Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '16859'                AND time_stamp BETWEEN '2009-6-24 6:00:00' AND '2009-6-24 6:59:59' ORDER BY time_stamp;Number of hours:1Start Hour SummarySELECT hour(time_stamp) hour, TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs,            count(pkg_load_id) total_pallets FROM pkg_load WHERE pkg_line_id = '16859'            AND time_stamp BETWEEN '2009-6-23 00:00:00' AND '2009-6-23 23:59:59' GROUP BY hour(time_stamp)            ORDER BY time_stamp DESC;Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '16859'                AND time_stamp BETWEEN '2009-6-23 15:00:00' AND '2009-6-23 15:59:59' ORDER BY time_stamp;Number of hours:5[/sql]

These work fine though. But the second one is way faster than the first obviously considering one is doing 60,000 rows and one is doing just 2,000.

Re: Executing huge queries(60,000rows +)

Posted: Mon Aug 03, 2009 3:21 pm
by VladSun
0.
VladSun wrote:Also, please, post the database schema you are using.
;)

1. I see a lot of queries ... which one is the problematic one?

Re: Executing huge queries(60,000rows +)

Posted: Mon Aug 03, 2009 3:41 pm
by drewh
Well they all are. I'm doing all these query's and displaying results on next page. I just didn't know if there was general stuff I could do the way I coded to make it faster. It takes a good 1 or 2 minutes to get to the next page. That's pretty long considering these individual queries take 1 to 2 seconds to complete.

Re: Executing huge queries(60,000rows +)

Posted: Mon Aug 03, 2009 6:08 pm
by Eran
As vlad requested, please post the schema of the tables involved in the queries. Also, run EXPLAIN on each and post the results here

Re: Executing huge queries(60,000rows +)

Posted: Tue Aug 04, 2009 9:12 am
by drewh
Hey guys I fixed the problem. I increased mysql memory by a lot. It's like after each row is done nothing is getting free'd. So its almost taking an unlimited amount of memory to keep storing the rows. I put result->free as you see in my code. Not sure what is happening. Thanks.

Re: Executing huge queries(60,000rows +)

Posted: Tue Aug 04, 2009 9:20 pm
by sysout
have you tried Oracle ?
PHP - Oracle can make your queries faster..

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 2:49 am
by onion2k
Cache things. For example, you're grouping by year(time_stamp) and dayofyear(time_stamp). Add a couple of columns and put those values into them when the record is inserted - then the database won't have to calculate those values during the query. That'll save you 120,000 function calls during your query right away.

Similarly, cache the "hour" value, then you won't need to do the BETWEEN bit of the WHERE clause... you'll be able to do "WHERE `hour` IN (11,12,13,14,15)"... that will be considerably quicker.

Lastly, if you're not already, add an index on the columns you're searching on. That will make the biggest difference.

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 2:50 am
by onion2k
sysout wrote:have you tried Oracle ?
PHP - Oracle can make your queries faster..
Hardly a sensible option considering the cost.

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 8:09 am
by drewh
onion2k wrote:Cache things. For example, you're grouping by year(time_stamp) and dayofyear(time_stamp). Add a couple of columns and put those values into them when the record is inserted - then the database won't have to calculate those values during the query. That'll save you 120,000 function calls during your query right away.

Similarly, cache the "hour" value, then you won't need to do the BETWEEN bit of the WHERE clause... you'll be able to do "WHERE `hour` IN (11,12,13,14,15)"... that will be considerably quicker.

Lastly, if you're not already, add an index on the columns you're searching on. That will make the biggest difference.
Hey thank you a lot. This is the kind of stuff I was looking for. I still have the issue with not completing the tables.

Example. I need data between June 1st and June 30th. I query that and display it on tables on a page. It never completes. It stops at June 16th or so. It used to stop earlier, ex June 6th. But I keep increasing the memory mysql uses. It's almost at a gig now. It gets close to complete but I don't want to go any higher. If I query between June 1st and 6th or so. It finishes fine. It usually stops right in the middle doing the bigger stuff. Like its running out of memory to complete. In fact I know it is. I don't understand. I'm freeing the results ever iteration. But it's still like it keep stacking the results over and over in memory. Is there something I can do?

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 8:10 am
by Eran
your query and indexes are probably underoptimized. if you could please post the structure of your tables and the EXPLAIN results on the query, we could try and help you

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 8:18 am
by drewh
Hey here is the explain
[sql] 'pkg_load_id', 'bigint(20)', 'NO', 'PRI', '', 'auto_increment''time_stamp', 'datetime', 'NO', '', '', '''pkg_line_id', 'int(11)', 'NO', 'MUL', '', '''wrap_spec_id', 'int(11)', 'YES', 'MUL', '', '''data_file_id', 'int(11)', 'YES', 'MUL', '', '''feet', 'double', 'NO', '', '0', '''ounces', 'double', 'YES', '', '', '''pallet_count', 'int(11)', 'YES', '', '', '''revolutions', 'double', 'YES', '', '', '''roll_total_feet', 'double', 'YES', '', '', '''average', 'double', 'YES', '', '', '''breaks_pallet', 'int(11)', 'YES', '', '', '''breaks_at_clamp', 'int(11)', 'YES', '', '', '''breaks_carriage_up', 'int(11)', 'YES', '', '', '''breaks_top', 'int(11)', 'YES', '', '', '''breaks_carriage_down', 'int(11)', 'YES', '', '', '''breaks_bottom', 'int(11)', 'YES', '', '', '''breaks_end', 'int(11)', 'YES', '', '', '''roll_change', 'tinyint(1)', 'YES', '', '', '''flag_note', 'tinyint(1)', 'YES', '', '', '' [/sql]
[sql] 1, 'SIMPLE', 'pkg_load', 'ref', 'FK_pkg_load_pkg_line_id', 'FK_pkg_load_pkg_line_id', '4', 'const', 45710, 'Using where; Using temporary; Using filesort' [/sql]
[sql] 1, 'SIMPLE', 'pkg_load', 'ref', 'FK_pkg_load_pkg_line_id', 'FK_pkg_load_pkg_line_id', '4', 'const', 45710, 'Using where; Using temporary; Using filesort' [/sql]

[sql] 1, 'SIMPLE', 'pkg_load', 'ref', 'FK_pkg_load_pkg_line_id', 'FK_pkg_load_pkg_line_id', '4', 'const', 45710, 'Using where; Using filesort' [/sql]

You have to remember those the last two queries are being ran a bunch of times. The last query has to be run every hour changing the time stamp. Example
[sql] SELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3079'                AND time_stamp BETWEEN '2009-6-24 10:00:00' AND '2009-6-24 10:59:59' ORDER BY time_stamp SELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3079'                AND time_stamp BETWEEN '2009-6-24 11:00:00' AND '2009-6-24 11:59:59' ORDER BY time_stamp SELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3079'                AND time_stamp BETWEEN '2009-6-24 12:00:00' AND '2009-6-24 12:59:59' ORDER BY time_stamp [/sql]

But this runs for 25 hours. Days is the same but it runs say 30 days. or 40 days. or 2 days

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 8:26 am
by onion2k
drewh wrote:I'm freeing the results ever iteration.
You could use an unbuffered query... mysql_unbuffered_query() ... that helps with memory issues but it means you can't do things like checking the number of results that have been returned.

Also, using flush() to clear the output buffer (and possibly ob_flush() if you're using PHP's buffer) will free up some of Apache's memory too.

Mind you, are you sure it's running out of memory rather than running out of time? If the script times out that would have a similar effect. Bump up PHP's timeout limit in PHP.ini, or use set_time_limit(0). But if you do that remember that the script could run forever so you should code in some sort of escape mechanism.

Re: Executing huge queries(60,000rows +)

Posted: Wed Aug 05, 2009 8:28 am
by onion2k
It's slow because of "Using temporary; Using filesort". Both of those are nasty.