Page 1 of 2

Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 11:34 am
by jaoudestudios
~pickle | Please use [ code=html ], [ code=php ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.


Wrote this the other day, anyone suggestion any improvements? or thoughts?
If it appears a bit odd, the query is created from php depending on certain search values. This query creation is one of the larger and most complex ones

Code: Select all

 
SELECT 
SQL_CALC_FOUND_ROWS 
requirement.require_id, requirement.title, requirement.min_ft, requirement.max_ft, requirement.min_m, requirement.max_m, 
requirement.min_a, requirement.max_a, requirement.min_h, requirement.max_h, requirement.pdf, requirement.date, requirement.description, 
CONCAT(requirement.min_ft, '-', requirement.max_ft, ' sqft, ', requirement.min_m, '-', requirement.max_m, ' meters, ', requirement.min_a, '-', requirement.max_a, ' acres, ', requirement.min_h, '-', requirement.max_h, ' hectares') AS sizeRange 
, retailer.account_type, retailer.name AS rName, retailer.retailer_id, retailer.logo 
, data_planning_type.planning_id, data_planning_type.class, data_planning_type.type, 
GROUP_CONCAT(DISTINCT data_planning_type.class SEPARATOR ', ') AS planType 
, GROUP_CONCAT(DISTINCT require_region_inc.region_id SEPARATOR ', ') AS location 
, (CASE WHEN ((require_region_exc.region_id = '24517' AND require_region_exc.depth = '3') 
OR (require_region_exc.region_id = data_regions_uk.id AND data_regions_uk.county = 'Surrey' AND require_region_exc.depth = '2') 
OR (require_region_exc.region_id = data_regions_uk.id AND data_regions_uk.region = 'South East' AND require_region_exc.depth = '1') 
) THEN -1 ELSE 0 END) AS notLocation 
, users.firstname, users.lastname, users.company 
FROM requirement 
JOIN retailer ON requirement.retailer_id = retailer.retailer_id 
LEFT JOIN require_planning ON requirement.require_id = require_planning.require_id 
LEFT JOIN data_planning_type ON require_planning.planning_id = data_planning_type.planning_id 
JOIN require_region_inc ON requirement.require_id = require_region_inc.require_id 
LEFT JOIN require_region_exc ON requirement.require_id = require_region_exc.require_id 
JOIN data_regions_uk ON require_region_inc.region_id = data_regions_uk.id 
 JOIN user_requirement ON requirement.require_id = user_requirement.require_id 
 JOIN users ON user_requirement.user_id = users.user_id 
WHERE   (data_planning_type.planning_id = '1') 
AND (require_region_inc.region_id = '24517' AND require_region_inc.depth = 3) 
AND  (requirement.description LIKE '%a%' OR requirement.title LIKE '%a%') 
AND  (requirement.min_ft < 1000 AND requirement.max_ft > 1000) 
AND  (users.firstname LIKE '%a%' OR users.lastname LIKE '%a%') 
AND  (users.company LIKE '%a%') 
AND  (retailer.name LIKE '%a%') 
GROUP BY
requirement.require_id 
HAVING notLocation = 0 
ORDER BY retailer.account_type ASC, requirement.date DESC
 
Thanks :)


~pickle | Please use [ code=html ], [ code=php ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 1:09 pm
by VladSun
First to notice:
Do not use HAVING for items that should be in the WHERE clause. For example, do not write the following:

SELECT col_name FROM tbl_name HAVING col_name > 0;
In fact, you use notLocation only in HAVING (and it should be WHERE) clause, so you dont need to have it in the SELECT clause (because it will be always = 0 in the result rows).

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 1:14 pm
by Eran
The Having is probably for SQL_CALC_ROWS, so it will return the right number. This is two problems - collecting more rows than needed, and filtering later when there is no chance for optimization.

Separating the row count into a separate query is the first thing I'd do, and then move the having to a where clause. Otherwise I'd like to see the EXPLAIN results of this query, probably some chances for index optimizations.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 1:23 pm
by VladSun
pytrin wrote:The Having is probably for SQL_CALC_ROWS, so it will return the right number.
Because the CASE() in this query uses no aggregate functions there will be no difference in the results with both clauses - HAVING and WHERE.
But it's more apropriate to be put in the WHERE clause because some rows will be filtered out before the expensive GROUP BY is applied - so query will be faster.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 1:28 pm
by VladSun
Another thing to mention:

[sql](CASE WHEN         (            (require_region_exc.region_id = '24517' AND require_region_exc.depth = '3')             OR             (require_region_exc.region_id = data_regions_uk.id AND data_regions_uk.county = 'Surrey' AND require_region_exc.depth = '2')             OR             (require_region_exc.region_id = data_regions_uk.id AND data_regions_uk.region = 'South East' AND require_region_exc.depth = '1')         )         THEN             -1         ELSE             0     END) AS notLocation, .....HAVING     notLocation = 0 [/sql]

and
[sql]JOIN     data_regions_uk ON require_region_inc.region_id = data_regions_uk.id [/sql]

The "require_region_exc.region_id = data_regions_uk.id" condition is used too many time - I think it can be optimized. The las INNER JOIN filters all rows which don't conform this condition.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 3:40 pm
by jaoudestudios
Thanks people for the feedback.

What is the difference between WHERE and HAVING, I thought I had to use HAVING for notLocation as it is calculated in the query and not predetermined?

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 3:55 pm
by Eran
From the MySQL manual: http://dev.mysql.com/doc/refman/5.0/en/select.html
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
The only advantage for having, is that it can use aggregate function such as MAX()
As Vladsun pointed out, you aren't using those - hence you can move the condition to the WHERE clause, where it can filter out results before grouping thereby improving performance.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 3:56 pm
by VladSun
HAVING is a "WHERE" clause applied on already "GROUPed BY" rows.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 3:57 pm
by jaoudestudios
Brilliant, thanks :)

I do some bench tests and post it back.

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 3:58 pm
by VladSun
pytrin wrote:The Having is probably for SQL_CALC_ROWS, so it will return the right number.
Just to clear that - as far as I know SQL_CALC_ROWS is suitable for use only if there is a LIMIT clause (which is not the case)

Re: Beast of a MySQL query, any suggestions?

Posted: Thu Jan 08, 2009 4:04 pm
by Eran
Yes, you are right - I was talking about limiting the total rows so that the desired number would be returned (count of rows matching conditions).

Also regarding sql_calc_found_rows:
http://www.mysqlperformanceblog.com/200 ... ound_rows/

Re: Beast of a MySQL query, any suggestions?

Posted: Fri Jan 09, 2009 2:24 am
by jaoudestudios
VladSun wrote:Just to clear that - as far as I know SQL_CALC_ROWS is suitable for use only if there is a LIMIT clause (which is not the case)
Yes you are right. The limit is not shown (not sure why - I must not have pasted it), but I am using a LIMIT for pagination.

Re: Beast of a MySQL query, any suggestions?

Posted: Fri Jan 09, 2009 2:28 am
by jaoudestudios
pytrin wrote:Also regarding sql_calc_found_rows:
http://www.mysqlperformanceblog.com/200 ... ound_rows/
Ouch, I had a quick read. I will go through it again after breakfast and run some bench tests of my own - I was not aware of this, I assumed it would be faster.

Thanks :wink:

Re: Beast of a MySQL query, any suggestions?

Posted: Fri Jan 09, 2009 5:30 am
by jaoudestudios
If I use explain on the query (including a limit of 1 :: LIMIT 1)....

Code: Select all

 
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------------+------+---------------------------------+
| id | select_type | table              | type   | possible_keys | key     | key_len | ref                                       | rows | Extra                           |
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------------+------+---------------------------------+
|  1 | SIMPLE      | data_planning_type | const  | PRIMARY       | PRIMARY | 4       | const                                     |    1 | Using temporary; Using filesort | 
|  1 | SIMPLE      | data_regions_uk    | const  | PRIMARY       | PRIMARY | 4       | const                                     |    1 |                                 | 
|  1 | SIMPLE      | require_planning   | ALL    | NULL          | NULL    | NULL    | NULL                                      |  976 | Using where                     | 
|  1 | SIMPLE      | requirement        | eq_ref | PRIMARY       | PRIMARY | 4       | pr_relational.require_planning.require_id |    1 | Using where                     | 
|  1 | SIMPLE      | require_region_exc | ALL    | NULL          | NULL    | NULL    | NULL                                      |   60 |                                 | 
|  1 | SIMPLE      | retailer           | eq_ref | PRIMARY       | PRIMARY | 4       | pr_relational.requirement.retailer_id     |    1 | Using where                     | 
|  1 | SIMPLE      | user_requirement   | ALL    | NULL          | NULL    | NULL    | NULL                                      | 1731 | Using where                     | 
|  1 | SIMPLE      | users              | eq_ref | PRIMARY       | PRIMARY | 4       | pr_relational.user_requirement.user_id    |    1 | Using where                     | 
|  1 | SIMPLE      | require_region_inc | ALL    | NULL          | NULL    | NULL    | NULL                                      | 5392 | Using where                     | 
+----+-------------+--------------------+--------+---------------+---------+---------+-------------------------------------------+------+---------------------------------+
 
To run the query with no modifications suggested it takes 0.07-0.08secs (development machine). This query & results is probably cached, how can I clear the cache to do a real test?

Removing SQL_CALC_FOUND_ROWS did not make a difference on time.

Because notLocation is not a column but a calculated variable from the CASE in the SELECT, it will not allow me to put in the WHERE. I did originally think that this calculated value at runtime had to go into the HAVING clause not WHERE. Or am I doing something wrong?

Re: Beast of a MySQL query, any suggestions?

Posted: Fri Jan 09, 2009 5:39 am
by VladSun
Remove the CASE and use it inner condition in the WHERE clause.
Also, you will see that some of the conditions are repeated or exclude each other, so optimize them :)