Executing huge queries(60,000rows +)

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

drewh
Forum Newbie
Posts: 12
Joined: Mon Aug 03, 2009 2:05 pm

Executing huge queries(60,000rows +)

Post by drewh »

Hey guys I have a problem. I'm porting a j2ee web application for my company to php. I have several parts of the script where I have to query over 60,000 rows. Is there anyway I can speed up this query. It must be the way I am coding it but it is really slow(obviously) while the java application is a little faster. The way I have to do is really tricky. Each query builds upon it self. So it goes from Days, to hour summary to per hour summary. Here is an example of how it works

Code: Select all

function DisplayReport($currentLine, $currentProcess, $dateStart, $dateEnd) {
    
    $dsn = 'mysql://'.MYSQL_USERNAME.''.MYSQL_PASSWORD.'@localhost/***';
    $mdb2 =& MDB2::factory($dsn);
    
    if (PEAR::isError($mdb2)) {
        echo ($mdb2->getMessage().' - '.$mdb2->getUserinfo());
    }
    logToFile("info.log", "Start Day Summary");
    
    $daySummary = DaySummaryPallet($currentLine, 0, $dateStart, $dateEnd);
    $numDays = count($daySummary);
    
    for ($counterDay = 0; $counterDay < $numDays; $counterDay ++) {
        
        echo "<table border = '1'>";
        echo "<tr><td>";
        $dateHour = $daySummary[$counterDay]['year']."-".$daySummary[$counterDay]['month']."-".$daySummary[$counterDay]['day']; //start date
        echo "Date: ".$dateHour."<br/>";
        echo "---------------";
        echo "<table><tr><td>";
        
        logToFile("info.log", "Start Hour Summary");
        $hourSummary = HourSummaryPallet($currentLine, 0, $dateHour, $dateHour);
        $numHours = count($hourSummary);
        
        for ($counterHour = 0; $counterHour < $numHours; $counterHour++) {
            
            echo "Hour: ".$hourSummary[$counterHour]['hour']."<br/>";
            //echo "Average ounces ".$hourSummary[$counterHour]['avg_ounces'];
            
            $datePerHour = $dateHour." ".$hourSummary[$counterHour]['hour'];
            
            logToFile("info.log", "Start Per Hour Summary");
            
            echo "<table border = '1'>";
            echo "<tr><td>Time</td><td>Ounces</td><td>Revolutions</td><td>Pallet Count</td>
              <td>Roll Change</td><td>Breaks</td></tr>";
            echo "</td></tr></table>";
            $perHourSummary = PerHourSummaryPallet($currentLine, 0, $datePerHour, $datePerHour);
            $numPerHours = count($perHourSummary);
            logToFile("info.log", "Number of hours:" .$numPerHours);
            for ($counterPerHour = 0; $counterPerHour < $numPerHours; $counterPerHour ++) {
                
                echo $perHourSummary[$counterPerHour]['ounces'];
                echo "<br/>";
            }
            
        }
        echo "</table>";
        echo "</td></tr></table>";
    }
    $mdb2->disconnect();
}
 
function DaySummaryPallet( $currentLine, $currentProcess, $dateStart, $dateEnd) {
    $dsn = 'mysql://'.MYSQL_USERNAME.''.MYSQL_PASSWORD.'@localhost/***';
    $mdb2 =& MDB2::factory($dsn);
    
    if (PEAR::isError($mdb2)) {
        echo ($mdb2->getMessage().' - '.$mdb2->getUserinfo());
    }
    $query = ("SELECT year(time_stamp) year, month(time_stamp) month, dayofmonth(time_stamp) day,
            truncate(avg(ounces),2) avg_ounces, truncate(sum(ounces)/16.0,2) lbs, count(pkg_load_id) total_pallets
            FROM pkg_load WHERE pkg_line_id = '$currentLine[$currentProcess]' AND time_stamp
            BETWEEN '".$dateStart." 00:00:00' AND '".$dateEnd." 23:59:59'
            GROUP BY year(time_stamp), dayofyear(time_stamp)
            ORDER BY time_stamp DESC;");
    
    $result = $mdb2->query($query);
    logToFile("info.log", $query);   
    
    if (PEAR::isError($result)) {
        echo ($result->getMessage().' - '.$result->getUserinfo());
        exit();
    }
    
    $resultArray = $result->fetchAll(MDB2_FETCHMODE_ASSOC);
    
    $result->free();
    $mdb2->disconnect();
    return $resultArray;
}
 
 
 
What I'm asking if there are any suggestions, tricks to make this faster. Thanks guys.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Executing huge queries(60,000rows +)

Post by VladSun »

60'000+ rows is not *huge* database ;) I think 1'000'000 is considered a "big" database, still not "huge".

Back to your problem. Could you, please, extract the SQL query (only SQL, no PHP) that is executed slow and paste its SQL code. Also, pleas, post the database schema you are using.

PS: Please, use [ sql ] and [ /sql ] BBCode tags (without spaces in them) to hilight your SQL code.
There are 10 types of people in this world, those who understand binary and those who don't
drewh
Forum Newbie
Posts: 12
Joined: Mon Aug 03, 2009 2:05 pm

Re: Executing huge queries(60,000rows +)

Post by drewh »

Hey. Yea the database is way bigger than 60,000. That's just one query I have to execute and display on a page. I am using Innodb

[sql]Start Day SummarySELECT year(time_stamp) year, month(time_stamp) month, dayofmonth(time_stamp) day,            TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs, count(pkg_load_id) total_pallets            FROM pkg_load WHERE pkg_line_id = '3075' AND time_stamp            BETWEEN '2009-06-03 00:00:00' AND '2009-08-03 23:59:59'            GROUP BY year(time_stamp), dayofyear(time_stamp)            ORDER BY time_stamp DESC;Start Hour SummarySELECT hour(time_stamp) hour, TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs,            count(pkg_load_id) total_pallets FROM pkg_load WHERE pkg_line_id = '3075'            AND time_stamp BETWEEN '2009-6-24 00:00:00' AND '2009-6-24 23:59:59' GROUP BY hour(time_stamp)            ORDER BY time_stamp DESC;Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3075'                AND time_stamp BETWEEN '2009-6-24 10:00:00' AND '2009-6-24 10:59:59' ORDER BY time_stamp;Number of hours:4Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3075'                AND time_stamp BETWEEN '2009-6-24 6:00:00' AND '2009-6-24 6:59:59' ORDER BY time_stamp;[/sql]

This sortof shows how it works it hangs on that one usually and only displays 1 day.

[sql]Start Day SummarySELECT year(time_stamp) year, month(time_stamp) month, dayofmonth(time_stamp) day,            TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs, count(pkg_load_id) total_pallets            FROM pkg_load WHERE pkg_line_id = '16859' AND time_stamp            BETWEEN '2009-06-03 00:00:00' AND '2009-08-03 23:59:59'            GROUP BY year(time_stamp), dayofyear(time_stamp)            ORDER BY time_stamp DESC;Start Hour SummarySELECT hour(time_stamp) hour, TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs,            count(pkg_load_id) total_pallets FROM pkg_load WHERE pkg_line_id = '16859'            AND time_stamp BETWEEN '2009-6-24 00:00:00' AND '2009-6-24 23:59:59' GROUP BY hour(time_stamp)            ORDER BY time_stamp DESC;Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '16859'                AND time_stamp BETWEEN '2009-6-24 7:00:00' AND '2009-6-24 7:59:59' ORDER BY time_stamp;Number of hours:7Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '16859'                AND time_stamp BETWEEN '2009-6-24 6:00:00' AND '2009-6-24 6:59:59' ORDER BY time_stamp;Number of hours:1Start Hour SummarySELECT hour(time_stamp) hour, TRUNCATE(avg(ounces),2) avg_ounces, TRUNCATE(sum(ounces)/16.0,2) lbs,            count(pkg_load_id) total_pallets FROM pkg_load WHERE pkg_line_id = '16859'            AND time_stamp BETWEEN '2009-6-23 00:00:00' AND '2009-6-23 23:59:59' GROUP BY hour(time_stamp)            ORDER BY time_stamp DESC;Start Per Hour SummarySELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '16859'                AND time_stamp BETWEEN '2009-6-23 15:00:00' AND '2009-6-23 15:59:59' ORDER BY time_stamp;Number of hours:5[/sql]

These work fine though. But the second one is way faster than the first obviously considering one is doing 60,000 rows and one is doing just 2,000.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Executing huge queries(60,000rows +)

Post by VladSun »

0.
VladSun wrote:Also, please, post the database schema you are using.
;)

1. I see a lot of queries ... which one is the problematic one?
There are 10 types of people in this world, those who understand binary and those who don't
drewh
Forum Newbie
Posts: 12
Joined: Mon Aug 03, 2009 2:05 pm

Re: Executing huge queries(60,000rows +)

Post by drewh »

Well they all are. I'm doing all these query's and displaying results on next page. I just didn't know if there was general stuff I could do the way I coded to make it faster. It takes a good 1 or 2 minutes to get to the next page. That's pretty long considering these individual queries take 1 to 2 seconds to complete.
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Executing huge queries(60,000rows +)

Post by Eran »

As vlad requested, please post the schema of the tables involved in the queries. Also, run EXPLAIN on each and post the results here
drewh
Forum Newbie
Posts: 12
Joined: Mon Aug 03, 2009 2:05 pm

Re: Executing huge queries(60,000rows +)

Post by drewh »

Hey guys I fixed the problem. I increased mysql memory by a lot. It's like after each row is done nothing is getting free'd. So its almost taking an unlimited amount of memory to keep storing the rows. I put result->free as you see in my code. Not sure what is happening. Thanks.
sysout
Forum Newbie
Posts: 2
Joined: Tue Aug 04, 2009 9:06 pm

Re: Executing huge queries(60,000rows +)

Post by sysout »

have you tried Oracle ?
PHP - Oracle can make your queries faster..
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Executing huge queries(60,000rows +)

Post by onion2k »

Cache things. For example, you're grouping by year(time_stamp) and dayofyear(time_stamp). Add a couple of columns and put those values into them when the record is inserted - then the database won't have to calculate those values during the query. That'll save you 120,000 function calls during your query right away.

Similarly, cache the "hour" value, then you won't need to do the BETWEEN bit of the WHERE clause... you'll be able to do "WHERE `hour` IN (11,12,13,14,15)"... that will be considerably quicker.

Lastly, if you're not already, add an index on the columns you're searching on. That will make the biggest difference.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Executing huge queries(60,000rows +)

Post by onion2k »

sysout wrote:have you tried Oracle ?
PHP - Oracle can make your queries faster..
Hardly a sensible option considering the cost.
drewh
Forum Newbie
Posts: 12
Joined: Mon Aug 03, 2009 2:05 pm

Re: Executing huge queries(60,000rows +)

Post by drewh »

onion2k wrote:Cache things. For example, you're grouping by year(time_stamp) and dayofyear(time_stamp). Add a couple of columns and put those values into them when the record is inserted - then the database won't have to calculate those values during the query. That'll save you 120,000 function calls during your query right away.

Similarly, cache the "hour" value, then you won't need to do the BETWEEN bit of the WHERE clause... you'll be able to do "WHERE `hour` IN (11,12,13,14,15)"... that will be considerably quicker.

Lastly, if you're not already, add an index on the columns you're searching on. That will make the biggest difference.
Hey thank you a lot. This is the kind of stuff I was looking for. I still have the issue with not completing the tables.

Example. I need data between June 1st and June 30th. I query that and display it on tables on a page. It never completes. It stops at June 16th or so. It used to stop earlier, ex June 6th. But I keep increasing the memory mysql uses. It's almost at a gig now. It gets close to complete but I don't want to go any higher. If I query between June 1st and 6th or so. It finishes fine. It usually stops right in the middle doing the bigger stuff. Like its running out of memory to complete. In fact I know it is. I don't understand. I'm freeing the results ever iteration. But it's still like it keep stacking the results over and over in memory. Is there something I can do?
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Executing huge queries(60,000rows +)

Post by Eran »

your query and indexes are probably underoptimized. if you could please post the structure of your tables and the EXPLAIN results on the query, we could try and help you
drewh
Forum Newbie
Posts: 12
Joined: Mon Aug 03, 2009 2:05 pm

Re: Executing huge queries(60,000rows +)

Post by drewh »

Hey here is the explain
[sql] 'pkg_load_id', 'bigint(20)', 'NO', 'PRI', '', 'auto_increment''time_stamp', 'datetime', 'NO', '', '', '''pkg_line_id', 'int(11)', 'NO', 'MUL', '', '''wrap_spec_id', 'int(11)', 'YES', 'MUL', '', '''data_file_id', 'int(11)', 'YES', 'MUL', '', '''feet', 'double', 'NO', '', '0', '''ounces', 'double', 'YES', '', '', '''pallet_count', 'int(11)', 'YES', '', '', '''revolutions', 'double', 'YES', '', '', '''roll_total_feet', 'double', 'YES', '', '', '''average', 'double', 'YES', '', '', '''breaks_pallet', 'int(11)', 'YES', '', '', '''breaks_at_clamp', 'int(11)', 'YES', '', '', '''breaks_carriage_up', 'int(11)', 'YES', '', '', '''breaks_top', 'int(11)', 'YES', '', '', '''breaks_carriage_down', 'int(11)', 'YES', '', '', '''breaks_bottom', 'int(11)', 'YES', '', '', '''breaks_end', 'int(11)', 'YES', '', '', '''roll_change', 'tinyint(1)', 'YES', '', '', '''flag_note', 'tinyint(1)', 'YES', '', '', '' [/sql]
[sql] 1, 'SIMPLE', 'pkg_load', 'ref', 'FK_pkg_load_pkg_line_id', 'FK_pkg_load_pkg_line_id', '4', 'const', 45710, 'Using where; Using temporary; Using filesort' [/sql]
[sql] 1, 'SIMPLE', 'pkg_load', 'ref', 'FK_pkg_load_pkg_line_id', 'FK_pkg_load_pkg_line_id', '4', 'const', 45710, 'Using where; Using temporary; Using filesort' [/sql]

[sql] 1, 'SIMPLE', 'pkg_load', 'ref', 'FK_pkg_load_pkg_line_id', 'FK_pkg_load_pkg_line_id', '4', 'const', 45710, 'Using where; Using filesort' [/sql]

You have to remember those the last two queries are being ran a bunch of times. The last query has to be run every hour changing the time stamp. Example
[sql] SELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3079'                AND time_stamp BETWEEN '2009-6-24 10:00:00' AND '2009-6-24 10:59:59' ORDER BY time_stamp SELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3079'                AND time_stamp BETWEEN '2009-6-24 11:00:00' AND '2009-6-24 11:59:59' ORDER BY time_stamp SELECT pallet_count, revolutions, ounces, time_stamp, breaks_pallet, roll_change                FROM pkg_load WHERE pkg_line_id = '3079'                AND time_stamp BETWEEN '2009-6-24 12:00:00' AND '2009-6-24 12:59:59' ORDER BY time_stamp [/sql]

But this runs for 25 hours. Days is the same but it runs say 30 days. or 40 days. or 2 days
Last edited by drewh on Wed Aug 05, 2009 8:34 am, edited 1 time in total.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Executing huge queries(60,000rows +)

Post by onion2k »

drewh wrote:I'm freeing the results ever iteration.
You could use an unbuffered query... mysql_unbuffered_query() ... that helps with memory issues but it means you can't do things like checking the number of results that have been returned.

Also, using flush() to clear the output buffer (and possibly ob_flush() if you're using PHP's buffer) will free up some of Apache's memory too.

Mind you, are you sure it's running out of memory rather than running out of time? If the script times out that would have a similar effect. Bump up PHP's timeout limit in PHP.ini, or use set_time_limit(0). But if you do that remember that the script could run forever so you should code in some sort of escape mechanism.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Executing huge queries(60,000rows +)

Post by onion2k »

It's slow because of "Using temporary; Using filesort". Both of those are nasty.
Post Reply