Page 2 of 2

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 8:03 am
by Eran
This query selects the entire table, twice (second pass for the left join). In addition, it uses a dependent subquery, which is the least efficient type of subquery.

First I would recommend adding some kind of filtering criteria on the main select query (do you really need all the rows in the table at the same time?). You should read this article on optimizing subqueries and joins that might help you take care of that subquery - http://www.xaprb.com/blog/2006/04/30/ho ... -in-mysql/

If all fails, I would still suggest running separate queries

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 8:32 am
by jackpf
I read that article...but it seems a bit advanced. I'm not entirely sure how to apply it...

But why does it select all rows for the main query? I only want "WHERE `Type`='forum'" and that's about 8 rows. Is there something I can do to make it only select the rows that are actually forums? If I add an index on `Type` maybe?

There isn't much I can do about the join...since it has to select all rows to see how many posts there are.

I'm unfamiliar with "dependent subqueries" as well...(I hardly ever actually use subqueries). Is there some way I can make this a normal subquery?

Thanks,
Jack.


EDIT
I just tried adding an index on `Type` and `Forum`, and I get this instead:
tmp.JPG
tmp.JPG (19.81 KiB) Viewed 1002 times
Only 8 rows!! And 270 for the join!! Should I keep the index? Is there any performance lost in other places due to the index?

I've heard that indexing makes INSERTs slower...
Should I create the index just before I execute the query and delete it afterwards maybe?

Sorry for all the questions :P

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 8:51 am
by Eran
There isn't much I can do about the join...since it has to select all rows to see how many posts there are.
For this reason it's best to separate the tables to "threads" and "posts" with a one-to-many relationship. Also you don't want to select all the threads - what if there are hundreds? probably should add a limit to cap it off.

Dependent subqueries have to be calcuated for each row (that is actually the normal type of subquery. A derived subquery (a subquery in the FROM clause, which I've shown you previously) runs independently from the query (as a temporary table which is joined against the original query). Also, take a look at this document (from those guys) detailing the meanings of the output you see in the EXPLAIN results - http://www.percona.com/files//presentat ... tified.pdf

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 8:56 am
by Eran
(you have a nasty habit of adding info as I'm writing a reply ;) )
Only 8 rows!! And 270 for the join!! Should I keep the index? Is there any performance lost in other places due to the index?

I've heard that indexing makes INSERTs slower...
Should I create the index just before I execute the query and delete it afterwards maybe?
Indexing is the basics of good query performance. It's true that indexing add a small overhead for INSERT operations, however consider:
1. In most web applications, the ratio of read to write is favors reads heavily
2. INSERT operations using keys (such as INSERT ... ON DUPLICATE KEY) benefit from having indexes as much as SELECT queries
3. UPDATE queries benefit from them as well
4. Without indexing your database will not scale beyond a very small size.
As long as you index properly to optimize your queries and not blindly, the overhead for INSERT will not be significant.

So yes, keeping the index is a must. I still think you should separate the table and normalize the structure of your database.
Your next order of business would be to avoid the temporary table and filesort for ordering the table. It should hit an index but can't since you order on a function

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 9:42 am
by jackpf
pytrin wrote:(you have a nasty habit of adding info as I'm writing a reply ;) )
Lol sorry :P

Right, well I shall definitely keep the indexes then... I currently have an index on ID (obviously, since it's the primary key), `Thread` (the threadid - used a lot in joins), `Forum` (ditto), and `Type` (which reduces the forum query rows by like 2000...so that's a must). They're all the columns that are generally used to select stuff...so I think that's all I need.
Your next order of business would be to avoid the temporary table and filesort for ordering the table. It should hit an index but can't since you order on a function
I EXPLAIN-ed the query again, but without the ORDER clause, and it still says it's using a temp table and a filesort though...is there something else causing it to use them?

Thanks :)
Jack.

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 10:05 am
by Eran
GROUP BY could also cause it (since it is a type of ordering). It should group on an index as well

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 10:11 am
by jackpf
Hmm...well it's grouping by F.`ID` which is a primary key, so surely that's indexed?

Ahh I'm not too bothered anyway. It's pretty speedy now anyway...the forum index, which selects 8 boards, joining a few thousand posts, amongst all the other queries and code executed, takes about 0.02 seconds. Is there really much more optimisation to be done? I mean...that's pretty damn quick imo.

But yeah, thanks for all your help!!
If you ever uhh...need any help from me...for some reason...just ask!! :P

Cheers,
Jack.

Re: ON DUPLICATE KEY, but without a PK...?

Posted: Mon Aug 17, 2009 11:15 am
by Eran
takes about 0.02 seconds. Is there really much more optimisation to be done? I mean...that's pretty damn quick imo.
currently no, if you don't expect your forum to grow much more then that's fine. I would opt for normalization regardless, for maintainability and integrity sake.