Page 1 of 1

Optimizing small query on large dataset

Posted: Tue May 20, 2008 12:04 pm
by John Cartwright
Here we are again, dealing with potentially massive datasets. For testing purposes, I've kept my database size at just under 2 million records. Basically, all I want to do is grab the last 10 rows that have not been last processed. Currently my query is taking about 40 seconds and cannot figure out how to improve performance. :banghead:

First, the query.

Code: Select all

SELECT id, type, userkey, name
FROM test
WHERE type = 'foobar'
ORDER BY processed ASC
LIMIT 10
Second, the database structure.

Code: Select all

 
CREATE TABLE `test` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `userkey` varchar(100) NOT NULL,
  `name` varchar(255) NOT NULL,
  `type` varchar(100) NOT NULL,
  `entered` datetime NOT NULL,
  `processed` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `type` (`type`),
  KEY `userkey` (`userkey`)
) ENGINE=MyISAM AUTO_INCREMENT=1924053 DEFAULT CHARSET=latin1;
Third, the explain.

Code: Select all

 
id  select_type     TABLE   type    possible_keys   KEY     key_len     ref     rows    Extra
1   SIMPLE          test    range   type            type       102     NULL     1924050     USING WHERE; USING filesort
Now, after doing a bit of research, I can now see that the query has to scan the entire 1.9 million rows which is why it is taking so long.. but I cannot figure out why. I just want to mention all the types are currently 'foobar'. We are still playing with the database structure, so the type field will eventually become a foreign key, but I seriously doubt that is causing my problems.

Can anyone provide insight? Thanks :)

Re: Optimizing small query on large dataset

Posted: Tue May 20, 2008 12:11 pm
by John Cartwright
Okay we'll stupid me, I had thought the issue was of the ordering of the last processed, but when I remove the where clause it is very fast. So how can I improve my where clause?

Re: Optimizing small query on large dataset

Posted: Tue May 20, 2008 1:10 pm
by VladSun
I've read that:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather than sorting the whole table.
It would be interesting to see if you have only 10% records with type = 'foobar2', then will this query be faster?
[sql] SELECT id, type, userkey, nameFROM testWHERE type = 'foobar2'ORDER BY processed ASCLIMIT 10[/sql]

and if explain will still show "using filesort".

Edit: also as you mentioned, try to change the type of "type" to numerical, index it and see what happens.

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 1:44 pm
by Robert07
Has this issue been resolved? I would like to know how it ended up as I am facing a similar issue.
Thanks,
Robert

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 2:28 pm
by John Cartwright
After changing the type to a numerical index, the query took around 1 second. It's hard to remember if that's all I did however. Might I suggest you post your query, table structure, and an EXPLAIN like I originally did.

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 2:42 pm
by Christopher
Your big problem is that you are sorting on 'processed' but it is not indexed. So MySQL has to scan all 2 million records every time. It's doing a pretty good job with internal optimizations (and no help from you) though... ;)

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 2:54 pm
by John Cartwright
arborint wrote:Your big problem is that you are sorting on 'processed' but it is not indexed. So MySQL has to scan all 2 million records every time. It's doing a pretty good job with internal optimizations (and no help from you) though... ;)
Where were you last month :mrgreen:

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 3:12 pm
by Eran
What arborint said is correct. What should have been a warning sign for you is the 'filesort' part in the explain.
Here is a nice article on several optimizations methods for this - http://www.mysqlperformanceblog.com/200 ... imization/

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 3:29 pm
by Christopher
http://www.mysqlperformanceblog.com

Great site pytrin ... thanks for the link!

Re: Optimizing small query on large dataset

Posted: Tue Jun 17, 2008 11:23 pm
by Benjamin
arborint wrote:Your big problem is that you are sorting on 'processed' but it is not indexed. So MySQL has to scan all 2 million records every time. It's doing a pretty good job with internal optimizations (and no help from you) though... ;)
That's the first thing I noticed. arborint beat me to it though.