Optimizing small query on large dataset

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Optimizing small query on large dataset

Post by John Cartwright »

Here we are again, dealing with potentially massive datasets. For testing purposes, I've kept my database size at just under 2 million records. Basically, all I want to do is grab the last 10 rows that have not been last processed. Currently my query is taking about 40 seconds and cannot figure out how to improve performance. :banghead:

First, the query.

Code: Select all

SELECT id, type, userkey, name
FROM test
WHERE type = 'foobar'
ORDER BY processed ASC
LIMIT 10
Second, the database structure.

Code: Select all

 
CREATE TABLE `test` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `userkey` varchar(100) NOT NULL,
  `name` varchar(255) NOT NULL,
  `type` varchar(100) NOT NULL,
  `entered` datetime NOT NULL,
  `processed` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `type` (`type`),
  KEY `userkey` (`userkey`)
) ENGINE=MyISAM AUTO_INCREMENT=1924053 DEFAULT CHARSET=latin1;
Third, the explain.

Code: Select all

 
id  select_type     TABLE   type    possible_keys   KEY     key_len     ref     rows    Extra
1   SIMPLE          test    range   type            type       102     NULL     1924050     USING WHERE; USING filesort
Now, after doing a bit of research, I can now see that the query has to scan the entire 1.9 million rows which is why it is taking so long.. but I cannot figure out why. I just want to mention all the types are currently 'foobar'. We are still playing with the database structure, so the type field will eventually become a foreign key, but I seriously doubt that is causing my problems.

Can anyone provide insight? Thanks :)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Optimizing small query on large dataset

Post by John Cartwright »

Okay we'll stupid me, I had thought the issue was of the ordering of the last processed, but when I remove the where clause it is very fast. So how can I improve my where clause?
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Optimizing small query on large dataset

Post by VladSun »

I've read that:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count lines rather than sorting the whole table.
It would be interesting to see if you have only 10% records with type = 'foobar2', then will this query be faster?
[sql] SELECT id, type, userkey, nameFROM testWHERE type = 'foobar2'ORDER BY processed ASCLIMIT 10[/sql]

and if explain will still show "using filesort".

Edit: also as you mentioned, try to change the type of "type" to numerical, index it and see what happens.
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Robert07
Forum Contributor
Posts: 113
Joined: Tue Jun 17, 2008 1:41 pm

Re: Optimizing small query on large dataset

Post by Robert07 »

Has this issue been resolved? I would like to know how it ended up as I am facing a similar issue.
Thanks,
Robert
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Optimizing small query on large dataset

Post by John Cartwright »

After changing the type to a numerical index, the query took around 1 second. It's hard to remember if that's all I did however. Might I suggest you post your query, table structure, and an EXPLAIN like I originally did.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Optimizing small query on large dataset

Post by Christopher »

Your big problem is that you are sorting on 'processed' but it is not indexed. So MySQL has to scan all 2 million records every time. It's doing a pretty good job with internal optimizations (and no help from you) though... ;)
(#10850)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Optimizing small query on large dataset

Post by John Cartwright »

arborint wrote:Your big problem is that you are sorting on 'processed' but it is not indexed. So MySQL has to scan all 2 million records every time. It's doing a pretty good job with internal optimizations (and no help from you) though... ;)
Where were you last month :mrgreen:
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: Optimizing small query on large dataset

Post by Eran »

What arborint said is correct. What should have been a warning sign for you is the 'filesort' part in the explain.
Here is a nice article on several optimizations methods for this - http://www.mysqlperformanceblog.com/200 ... imization/
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Optimizing small query on large dataset

Post by Christopher »

http://www.mysqlperformanceblog.com

Great site pytrin ... thanks for the link!
(#10850)
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Optimizing small query on large dataset

Post by Benjamin »

arborint wrote:Your big problem is that you are sorting on 'processed' but it is not indexed. So MySQL has to scan all 2 million records every time. It's doing a pretty good job with internal optimizations (and no help from you) though... ;)
That's the first thing I noticed. arborint beat me to it though.
Post Reply