database optimization question

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
bg
Forum Contributor
Posts: 157
Joined: Fri Sep 12, 2003 11:01 am

database optimization question

Post by bg »

I'm designing an application that will query the database many times in short bursts. I want it to be able to perform many queries in as short a time as possible, say 10,000 plus at a time. I always use explain on my queries to find ways to improve them. According to the mysql documentation, if speed is your issue, you do not want any queries that need have "using temporary, using filesort" in the extra column of an explain result. These are often times used when you gotta use the GROUP BY clause, something I saw often in my code when finding out how many child records there were of a parent related by the parents primary key as an index on the child similarly named field.

For example, consider this query:

Code: Select all

SELECT count(child.parent_id) 
FROM parent_table, child_table 
WHERE child_table.parent_id = parent_table.parent_id
GROUP BY parent_table.parent_id
an explain of this query will give you this

id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE child_table index parent_id parent_id 4 NULL 2 Using index; Using temporary; Using filesort
1 SIMPLE parent_table eq_ref PRIMARY PRIMARY 4 database.child_table.parent_id 1 Using index

I got the "using temporary and using filesort", which i dont want as this query will be run quiet often. My solution was to create a field "child_count" on the parent table, which was updated whenever a child with the same parent_id was created, moved, or deleted. Of course this was all done in php, and while the code is pretty good, the fact that this updating is not done (or simply queried using count() like above) in mysql cuts down on referential integrity.

Now here are my questions:

Am i getting too picky about optimization?

Am I going about optimizing this query the right way?


Thanks for the help.
User avatar
JAM
DevNet Resident
Posts: 2101
Joined: Fri Aug 08, 2003 6:53 pm
Location: Sweden
Contact:

Post by JAM »

To picky? Noone know what hardware the database sits on, if the box is dedicated or not, not if 10k's of queries takes 0.002 ms or 20 seconds.
You alone can decide this alone. ;)

Be sure to see what the OPTIMIZE and ANALYZE gives you, aswell as playing around with how indexes works. Nothing wrong with wanting speed, but the database is usually faster than one might think.

Tried benchmarking it using different methods yet?
User avatar
nielsene
DevNet Resident
Posts: 1834
Joined: Fri Aug 16, 2002 8:57 am
Location: Watertown, MA

Post by nielsene »

How often are those tables changed (Update,Delete,Insert) versus how often does this query run? By handling "triggers" manually in PHP you've imposed a rather large preformance penalty if the updates are even a moderate fraction of the selects.

Have you benchmarked the application? Its normally not worth fine tuning a small paticular query until you know its a bottleneck. Plus that query is simple enough, even if run often, that I wouldn't expect it to affect preformence. Normally two "types" of queries are important to look at:

N+1: Any place where you query once for the number of rows and then again for each row, should be looked at if you can combined all to a single query using a join. This is normally even more important if the 'N" queries part already involves a few joins.

'Large Joins': Typically 5 or more tables involved in a join, joins with subselects, or embedded functions are "expensive" and tough to plan. If these queries are used even slightly frequently, they can have an immediate impact on preformace as they start to hog resources (CPU and memory). However Large Joins are normally still better than N+1, so don't split them back up, rather play with EXPLAIN here to make them better.
bg
Forum Contributor
Posts: 157
Joined: Fri Sep 12, 2003 11:01 am

Post by bg »

nielsene wrote:How often are those tables changed (Update,Delete,Insert) versus how often does this query run? By handling "triggers" manually in PHP you've imposed a rather large preformance penalty if the updates are even a moderate fraction of the selects.

Have you benchmarked the application? Its normally not worth fine tuning a small paticular query until you know its a bottleneck. Plus that query is simple enough, even if run often, that I wouldn't expect it to affect preformence. Normally two "types" of queries are important to look at:

N+1: Any place where you query once for the number of rows and then again for each row, should be looked at if you can combined all to a single query using a join. This is normally even more important if the 'N" queries part already involves a few joins.

'Large Joins': Typically 5 or more tables involved in a join, joins with subselects, or embedded functions are "expensive" and tough to plan. If these queries are used even slightly frequently, they can have an immediate impact on preformace as they start to hog resources (CPU and memory). However Large Joins are normally still better than N+1, so don't split them back up, rather play with EXPLAIN here to make them better.
I did end up learning triggers and used that to implement the "child_count" field optimization and it works great and really simplifies the php code as well as increases referential integrity. No more "using index, using filesort" either when running queries on these tables. The tables are modified only through an administration interface where anywhere from 3 to a few hundred (eventually) users are granted access. The entire project is a proof of concept that has turned out better than I'd expected. Essentially what it is, is a dynamically generated static website that only updates pages when they require it. Some may consider it an over-glorified caching system, but my entire idea was based off having a pseudo dynamic website that did not require any parsing (for traffic based revenue) , as the administration system could run on a different machine entirely. Anyway... sorry for the tangent.
Post Reply