Page 1 of 2
MyISAM slow COUNT() on large dataset
Posted: Thu Jan 08, 2009 6:51 pm
by John Cartwright
As the title suggests, when counting a table with a relatively large amount of rows (over 3 million), the queries are taking over 20 seconds. The query below worked quite well until we hit the 3 million row mark, and am trying to determine what can be done to improve performance. I've never had a simple COUNT() perform so badly for me in the past, so I'm not exactly what the best steps to be are.
I know that I should probably change the type field to a numerical index, but I don't think that would be that much of a problem since there are only 2-3 different values (this will be changed, just potentially more important things on my plate).
Code: Select all
SELECT COUNT(snlist.id) AS count, type
FROM snlist
GROUP BY type
Code: Select all
CREATE TABLE IF NOT EXISTS `snlist` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`userkey` varchar(100) NOT NULL,
`type` varchar(6) NOT NULL,
`created` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `userkey` (`userkey`),
KEY `type` (`type`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4594005 ;
An explain seems to not reveal anything out of the ordinary (atleast from my understanding).
Code: Select all
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE snlist index NULL type 8 NULL 3929238 Using index
Any insight is much appreciated, as usual.
Thanks!
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 3:32 am
by jaoudestudios
If you remove the GROUP BY and type (in the select) it will perform much faster as the statistics of the total rows in the table will be stored in the database engine and it wont need to be calculated.
Then add back in the type (in the select) and see how much longer that takes. Remove it and put back in the GROUP BY and do the same. See which one is the most costly.
Post back results as I would be interested to know too. Then we can take it from there.
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 5:55 am
by VladSun
There is a good manual about optimizing GROUP BY (although it's not suitable for your query IMHO):
http://dev.mysql.com/doc/refman/5.1/en/ ... ation.html
The query below worked quite well until we hit the 3 million row mark, and am trying to determine what can be done to improve performance.
Sounds like you need to tune the MySQL service itself.
http://dev.mysql.com/tech-resources/pre ... index.html
Important MySQL startup options
...
max_heap_table_size Used with GROUP BY
sort_buffer Used with ORDER BY and GROUP BY
PS: I don't like using *char fields for sorting, grouping etc. Can you change the type of `type` to numeric?
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 10:36 am
by John Cartwright
Awesome replies!
@jaoudestudios: We ended up adding another million rows of data, so now we are at 4 million rows in this table. The performance wen't down quite significantly. Here are my latest benchmarks.
Simple count took around ~1. Phpmyadmin isn't giving me benchmarks on this query.
Query with type no field and group by took 54.9286 sec
Query with type field and group by 55.8039 sec
@VladSun: Thanks for the links!
I havn't begun my work shift yet but when I do I will pull up my current configuration and revelant points, and hopefully we can work at tuning the database optimally.
VlabSun wrote:PS: I don't like using *char fields for sorting, grouping etc. Can you change the type of `type` to numeric?
Like I said earlier, I only want to make that change unless its absolutely neccesary (since the codebase is already complete). However, I won't hesitate to do so if you suggest there is significant benefits in this situation. I thought it wouldn't really be an issue since the char values are only 3 characters long. I'm going to add another column to test to see how beneficial this is.
So there is hope at the end of the tunnel? Should I look at splitting the table or replication?
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 10:49 am
by jaoudestudios
Jcart wrote:@jaoudestudios: We ended up adding another million rows of data, so now we are at 4 million rows in this table. The performance wen't down quite significantly. Here are my latest benchmarks.
Simple count took around ~1. Phpmyadmin isn't giving me benchmarks on this query.
Query with type no field and group by took 54.9286 sec
Query with type field and group by 55.8039 sec
1sec for simple count, that is still quite a lot!?!?! So not sure what is going on there.
Not much of a difference when adding the field - as expected.
The group took a lot of work though - as expected again.
How of curiosity how many megs is the database?
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 10:58 am
by John Cartwright
The entire database is relatively small, only 575mb.
The table in question in 490mb.
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 11:12 am
by jaoudestudios
Its quite big to send if over (obviously if it does not have sensitive data and you didnt mind), as I wanted to try and run some bench marks against it myself.
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 11:20 am
by John Cartwright
Unfortunately, it's company data, so I would get in big trouble

.
I can can create a mock dataset for you to play with though. Feel free to PM me and I'll give you my contact details.
Thanks!
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 11:22 am
by Eran
The problem with this query is that its running an index select without an actual index! (at least that's what it shows in the explain) If it ran a full table scan instead (which is what its doing in practice) it would complete much faster. Try two things:
1. Run ANALYZE TABLE to help mysql make better descisions on its execution plan. Also run OPTIMIZE TABLE in case it is defragmented
2. Try to force it not to use an index using IGNORE on the various indexes in the table (though probably won't have much affect, since it doesn't select a real index)
I once had this happen for a table I populated very quickly with a ton of mock data. The index statistics weren't up to speed, and the query ran an index query instead of a full scan with similar results to what you are experiencing.
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 12:06 pm
by John Cartwright
pytrin wrote:The problem with this query is that its running an index select without an actual index! (at least that's what it shows in the explain) If it ran a full table scan instead (which is what its doing in practice) it would complete much faster.
Indeed the data was dumped in very quickly, so the issue I'm having may be similar. This is a troubling issue, because I don't really understand whats going on behind the hood. Can you explain exactly what you mean by "its running an index select without an actual index"), and how you determine this from the original EXPLAIN.
pytrin wrote:1. Run ANALYZE TABLE to help mysql make better descisions on its execution plan. Also run OPTIMIZE TABLE in case it is defragmented
I had thought to run them both before. I've run them again for the sake of it.
Code: Select all
Table Op Msg_type Msg_text
dbname.snlist analyze status OK
I've also run optimize. Not sure if the results matter.
Code: Select all
Table Op Msg_type Msg_text
dbname.snlist optimize status OK
pytrin wrote:2. Try to force it not to use an index using IGNORE on the various indexes in the table (though probably won't have much affect, since it doesn't select a real index)
Is this what you meant?
Code: Select all
SELECT COUNT(id) , type FROM `snlist`
IGNORE INDEX (type)
GROUP BY type
An explain reveals:
Code: Select all
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE snlist ALL NULL NULL NULL NULL 3929238 Using temporary; Using filesort
From my understanding this is creating a temporary table and internally applying a sort, which is slow. (I may be wrong). So why exactly would this be beneficial to ignore the keys?
As usual, much appreciated!
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 12:08 pm
by VladSun
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 12:25 pm
by John Cartwright
This is going to be a long afternoon

What is the correct syntax for using the force index on the group by clause? I've tried the statement below, but it gives syntax error at GROUP BY(`type`)
Code: Select all
SELECT COUNT(*), type
FROM `snlist`
FORCE INDEX FOR GROUP BY(`type`)
GROUP BY type
The statement below gives a little bit better performance. The queries returned are between 15-20 seconds.
Code: Select all
SELECT COUNT( * ) , type
FROM `snlist`
FORCE INDEX ( `type` )
GROUP BY TYPE
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 1:35 pm
by VladSun
Did you change MySQL config files - tuning it may has a much larger effect than optimizing the query in some cases - especially for relatively large DB.
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 2:29 pm
by Eran
From my understanding this is creating a temporary table and internally applying a sort, which is slow. (I may be wrong). So why exactly would this be beneficial to ignore the keys?
Yes you are correct that the sorting is relatively slow, but at least according to the explain it is running a full table scan now (type - ALL). Did you try to see how fast the query runs with this?
Can you explain exactly what you mean by "its running an index select without an actual index")
The select type in the EXPLAIN indicates it is an index select, however without sufficiently selective index it is slower than doing a full table scan (the cut-off is around 50% of the table rows). Since in practice actually all rows are examined this is much slower.
Re: MyISAM slow COUNT() on large dataset
Posted: Fri Jan 09, 2009 7:14 pm
by John Cartwright
So I got the values that you recommended changing, ..
Code: Select all
max_heap_size 16777216
sort_buffer_size = 2097144
and ended up adding a 0 to each of their values. After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.
Can anyone recommend how to determine what the ideal setting would be? I don't want to screw around with things too much without knowing the side effects.
Am I just being unrealistic if I want to calculate such a simple number dynamically? Would it just be more advisable to simply use a summary table?
Thanks
