MyISAM slow COUNT() on large dataset

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

MyISAM slow COUNT() on large dataset

Post by John Cartwright »

As the title suggests, when counting a table with a relatively large amount of rows (over 3 million), the queries are taking over 20 seconds. The query below worked quite well until we hit the 3 million row mark, and am trying to determine what can be done to improve performance. I've never had a simple COUNT() perform so badly for me in the past, so I'm not exactly what the best steps to be are.

I know that I should probably change the type field to a numerical index, but I don't think that would be that much of a problem since there are only 2-3 different values (this will be changed, just potentially more important things on my plate).

Code: Select all

SELECT COUNT(snlist.id) AS count, type
FROM snlist
GROUP BY type

Code: Select all

 
CREATE TABLE IF NOT EXISTS `snlist` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `userkey` varchar(100) NOT NULL,
  `type` varchar(6) NOT NULL,
  `created` datetime NOT NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `userkey` (`userkey`),
  KEY `type` (`type`),
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=4594005 ;
An explain seems to not reveal anything out of the ordinary (atleast from my understanding).

Code: Select all

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  snlist   index   NULL    type    8   NULL    3929238     Using index
Any insight is much appreciated, as usual.

Thanks!
User avatar
jaoudestudios
DevNet Resident
Posts: 1483
Joined: Wed Jun 18, 2008 8:32 am
Location: Surrey

Re: MyISAM slow COUNT() on large dataset

Post by jaoudestudios »

If you remove the GROUP BY and type (in the select) it will perform much faster as the statistics of the total rows in the table will be stored in the database engine and it wont need to be calculated.

Then add back in the type (in the select) and see how much longer that takes. Remove it and put back in the GROUP BY and do the same. See which one is the most costly.

Post back results as I would be interested to know too. Then we can take it from there.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

There is a good manual about optimizing GROUP BY (although it's not suitable for your query IMHO):
http://dev.mysql.com/doc/refman/5.1/en/ ... ation.html

The query below worked quite well until we hit the 3 million row mark, and am trying to determine what can be done to improve performance.
Sounds like you need to tune the MySQL service itself.
http://dev.mysql.com/tech-resources/pre ... index.html
Important MySQL startup options
...
max_heap_table_size Used with GROUP BY
sort_buffer Used with ORDER BY and GROUP BY
PS: I don't like using *char fields for sorting, grouping etc. Can you change the type of `type` to numeric?
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

Awesome replies!

@jaoudestudios: We ended up adding another million rows of data, so now we are at 4 million rows in this table. The performance wen't down quite significantly. Here are my latest benchmarks.

Simple count took around ~1. Phpmyadmin isn't giving me benchmarks on this query.

Query with type no field and group by took 54.9286 sec

Query with type field and group by 55.8039 sec

@VladSun: Thanks for the links!

I havn't begun my work shift yet but when I do I will pull up my current configuration and revelant points, and hopefully we can work at tuning the database optimally.
VlabSun wrote:PS: I don't like using *char fields for sorting, grouping etc. Can you change the type of `type` to numeric?
Like I said earlier, I only want to make that change unless its absolutely neccesary (since the codebase is already complete). However, I won't hesitate to do so if you suggest there is significant benefits in this situation. I thought it wouldn't really be an issue since the char values are only 3 characters long. I'm going to add another column to test to see how beneficial this is.

So there is hope at the end of the tunnel? Should I look at splitting the table or replication?
Last edited by John Cartwright on Fri Jan 09, 2009 11:02 am, edited 1 time in total.
Reason: added Vlad quote
User avatar
jaoudestudios
DevNet Resident
Posts: 1483
Joined: Wed Jun 18, 2008 8:32 am
Location: Surrey

Re: MyISAM slow COUNT() on large dataset

Post by jaoudestudios »

Jcart wrote:@jaoudestudios: We ended up adding another million rows of data, so now we are at 4 million rows in this table. The performance wen't down quite significantly. Here are my latest benchmarks.

Simple count took around ~1. Phpmyadmin isn't giving me benchmarks on this query.

Query with type no field and group by took 54.9286 sec

Query with type field and group by 55.8039 sec
1sec for simple count, that is still quite a lot!?!?! So not sure what is going on there.

Not much of a difference when adding the field - as expected.

The group took a lot of work though - as expected again.

How of curiosity how many megs is the database?
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

The entire database is relatively small, only 575mb.

The table in question in 490mb.
User avatar
jaoudestudios
DevNet Resident
Posts: 1483
Joined: Wed Jun 18, 2008 8:32 am
Location: Surrey

Re: MyISAM slow COUNT() on large dataset

Post by jaoudestudios »

Its quite big to send if over (obviously if it does not have sensitive data and you didnt mind), as I wanted to try and run some bench marks against it myself.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

Unfortunately, it's company data, so I would get in big trouble :).

I can can create a mock dataset for you to play with though. Feel free to PM me and I'll give you my contact details.

Thanks!
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: MyISAM slow COUNT() on large dataset

Post by Eran »

The problem with this query is that its running an index select without an actual index! (at least that's what it shows in the explain) If it ran a full table scan instead (which is what its doing in practice) it would complete much faster. Try two things:
1. Run ANALYZE TABLE to help mysql make better descisions on its execution plan. Also run OPTIMIZE TABLE in case it is defragmented
2. Try to force it not to use an index using IGNORE on the various indexes in the table (though probably won't have much affect, since it doesn't select a real index)

I once had this happen for a table I populated very quickly with a ton of mock data. The index statistics weren't up to speed, and the query ran an index query instead of a full scan with similar results to what you are experiencing.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

pytrin wrote:The problem with this query is that its running an index select without an actual index! (at least that's what it shows in the explain) If it ran a full table scan instead (which is what its doing in practice) it would complete much faster.
Indeed the data was dumped in very quickly, so the issue I'm having may be similar. This is a troubling issue, because I don't really understand whats going on behind the hood. Can you explain exactly what you mean by "its running an index select without an actual index"), and how you determine this from the original EXPLAIN.
pytrin wrote:1. Run ANALYZE TABLE to help mysql make better descisions on its execution plan. Also run OPTIMIZE TABLE in case it is defragmented
I had thought to run them both before. I've run them again for the sake of it.

Code: Select all

Table             Op        Msg_type    Msg_text
dbname.snlist   analyze     status      OK
I've also run optimize. Not sure if the results matter.

Code: Select all

Table           Op          Msg_type    Msg_text
dbname.snlist   optimize    status      OK
pytrin wrote:2. Try to force it not to use an index using IGNORE on the various indexes in the table (though probably won't have much affect, since it doesn't select a real index)
Is this what you meant?

Code: Select all

SELECT COUNT(id) , type FROM `snlist`
IGNORE INDEX (type)
GROUP BY type
An explain reveals:

Code: Select all

 
id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  snlist  ALL     NULL    NULL    NULL    NULL    3929238     Using temporary; Using filesort
From my understanding this is creating a temporary table and internally applying a sort, which is slow. (I may be wrong). So why exactly would this be beneficial to ignore the keys?

As usual, much appreciated!
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

Try FORCE INDEX(`type`) FOR GROUP BY
http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

This is going to be a long afternoon :banghead: What is the correct syntax for using the force index on the group by clause? I've tried the statement below, but it gives syntax error at GROUP BY(`type`)

Code: Select all

SELECT COUNT(*), type 
FROM `snlist` 
FORCE INDEX FOR GROUP BY(`type`) 
GROUP BY type
The statement below gives a little bit better performance. The queries returned are between 15-20 seconds.

Code: Select all

SELECT COUNT( * ) , type 
FROM `snlist`
FORCE INDEX ( `type` )
GROUP BY TYPE
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

Did you change MySQL config files - tuning it may has a much larger effect than optimizing the query in some cases - especially for relatively large DB.
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: MyISAM slow COUNT() on large dataset

Post by Eran »

From my understanding this is creating a temporary table and internally applying a sort, which is slow. (I may be wrong). So why exactly would this be beneficial to ignore the keys?
Yes you are correct that the sorting is relatively slow, but at least according to the explain it is running a full table scan now (type - ALL). Did you try to see how fast the query runs with this?
Can you explain exactly what you mean by "its running an index select without an actual index")
The select type in the EXPLAIN indicates it is an index select, however without sufficiently selective index it is slower than doing a full table scan (the cut-off is around 50% of the table rows). Since in practice actually all rows are examined this is much slower.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

So I got the values that you recommended changing, ..

Code: Select all

max_heap_size 16777216
sort_buffer_size = 2097144
and ended up adding a 0 to each of their values. After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.

Can anyone recommend how to determine what the ideal setting would be? I don't want to screw around with things too much without knowing the side effects.

Am I just being unrealistic if I want to calculate such a simple number dynamically? Would it just be more advisable to simply use a summary table?

Thanks :)
Post Reply