MyISAM slow COUNT() on large dataset

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: MyISAM slow COUNT() on large dataset

Post by Eran »

Am I just being unrealistic if I want to calculate such a simple number dynamically? Would it just be more advisable to simply use a summary table?
I think the COUNT is not the problem here, but fetching the type column for 3million rows. If it was just counting, it would finish much faster.
User avatar
jaoudestudios
DevNet Resident
Posts: 1483
Joined: Wed Jun 18, 2008 8:32 am
Location: Surrey

Re: MyISAM slow COUNT() on large dataset

Post by jaoudestudios »

Jcart wrote:

Code: Select all

max_heap_size 16777216
sort_buffer_size = 2097144
and ended up adding a 0 to each of their values. After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.
Wow so these two changes to the mysql server setup had a massive effect?!?!
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

Try increasing the key_buffer_size also:
http://www.databasejournal.com/features ... iables.htm

In the binary installation MYSQL packet there are some config samples:
my-huge.cnf my-large.cnf my-medium.cnf my-small.cnf

It could be a good idea to see how they differ from your config.
Also keep in mind that if you increase the memory used by MySQL too much, the OS can start swapping its used memory - that will be a very bad scenario ;)
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

Query with type field and group by 55.8039 sec
The statement below gives a little bit better performance. The queries returned are between 15-20 seconds.
After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.
It's 18.6 times faster :)
You should be happy, shouldn't you ;)
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

Jcart wrote:Can anyone recommend how to determine what the ideal setting would be? I don't want to screw around with things too much without knowing the side effects.
I've found these:

http://www.linuxweblog.com/tune-my.cnf
http://www.debianhelp.co.uk/mysqlperformance.htm
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: MyISAM slow COUNT() on large dataset

Post by John Cartwright »

Thanks for all the help guys. I realy appreciate itn

We ended up upgrading our server to a beast of a machine and the query now takes only 2 seconds (using the same mysql configuration). I did end up reducing the database back to 3 million rows, so that may have helped also.

I picked up a copy of High Performance Mysql 2nd Edition, and it did propose the usage of sharding in such situations. This seems like an optimal situation because I can categorize the shards by their type (and thus avoid having index issues on large tables alltogether).

Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.

This does seem to be a bit overkill, but I would like to plan for the future as we can expect many more millions of rows to come. Does anything have any suggestions or insight into this?
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

What does "(using the same mysql configuration)" mean? Because I do believe the default values in mysql configs are too low nowadays.
Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.
I couldn't understand that question. By using "shard" did you mean partitioning. E.g:
http://dev.mysql.com/doc/refman/5.1/en/ ... g-key.html

And what is a "size of shard"?
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
Eran
DevNet Master
Posts: 3549
Joined: Fri Jan 18, 2008 12:36 am
Location: Israel, ME

Re: MyISAM slow COUNT() on large dataset

Post by Eran »

No, sharding is placing separate parts of the database on separate servers. This is an effective technique for very large databases, but you lose a lot of flexibility as JOINs become very expensive. This is more for scaling out, but you are just looking to increase performance.
I think you should heed vlad's advice and start with partioning, since you are only concerned with a specific large table.

Those techniques by the way won't help with counting unfurtunately, since you still need to access the entire table... what is generally done in those situations when performance is unacceptable is to use approx. count techniques that don't actually count all the rows.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: MyISAM slow COUNT() on large dataset

Post by VladSun »

pytrin wrote:No, sharding is placing separate parts of the database on separate servers.
Oh! It will be a overkill IMHO :)
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply