I think the COUNT is not the problem here, but fetching the type column for 3million rows. If it was just counting, it would finish much faster.Am I just being unrealistic if I want to calculate such a simple number dynamically? Would it just be more advisable to simply use a summary table?
MyISAM slow COUNT() on large dataset
Moderator: General Moderators
Re: MyISAM slow COUNT() on large dataset
- jaoudestudios
- DevNet Resident
- Posts: 1483
- Joined: Wed Jun 18, 2008 8:32 am
- Location: Surrey
Re: MyISAM slow COUNT() on large dataset
Wow so these two changes to the mysql server setup had a massive effect?!?!Jcart wrote:and ended up adding a 0 to each of their values. After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.Code: Select all
max_heap_size 16777216 sort_buffer_size = 2097144
Re: MyISAM slow COUNT() on large dataset
Try increasing the key_buffer_size also:
http://www.databasejournal.com/features ... iables.htm
In the binary installation MYSQL packet there are some config samples:
my-huge.cnf my-large.cnf my-medium.cnf my-small.cnf
It could be a good idea to see how they differ from your config.
Also keep in mind that if you increase the memory used by MySQL too much, the OS can start swapping its used memory - that will be a very bad scenario
http://www.databasejournal.com/features ... iables.htm
In the binary installation MYSQL packet there are some config samples:
my-huge.cnf my-large.cnf my-medium.cnf my-small.cnf
It could be a good idea to see how they differ from your config.
Also keep in mind that if you increase the memory used by MySQL too much, the OS can start swapping its used memory - that will be a very bad scenario
There are 10 types of people in this world, those who understand binary and those who don't
Re: MyISAM slow COUNT() on large dataset
Query with type field and group by 55.8039 sec
The statement below gives a little bit better performance. The queries returned are between 15-20 seconds.
It's 18.6 times fasterAfter doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.
You should be happy, shouldn't you
There are 10 types of people in this world, those who understand binary and those who don't
Re: MyISAM slow COUNT() on large dataset
I've found these:Jcart wrote:Can anyone recommend how to determine what the ideal setting would be? I don't want to screw around with things too much without knowing the side effects.
http://www.linuxweblog.com/tune-my.cnf
http://www.debianhelp.co.uk/mysqlperformance.htm
There are 10 types of people in this world, those who understand binary and those who don't
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: MyISAM slow COUNT() on large dataset
Thanks for all the help guys. I realy appreciate itn
We ended up upgrading our server to a beast of a machine and the query now takes only 2 seconds (using the same mysql configuration). I did end up reducing the database back to 3 million rows, so that may have helped also.
I picked up a copy of High Performance Mysql 2nd Edition, and it did propose the usage of sharding in such situations. This seems like an optimal situation because I can categorize the shards by their type (and thus avoid having index issues on large tables alltogether).
Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.
This does seem to be a bit overkill, but I would like to plan for the future as we can expect many more millions of rows to come. Does anything have any suggestions or insight into this?
We ended up upgrading our server to a beast of a machine and the query now takes only 2 seconds (using the same mysql configuration). I did end up reducing the database back to 3 million rows, so that may have helped also.
I picked up a copy of High Performance Mysql 2nd Edition, and it did propose the usage of sharding in such situations. This seems like an optimal situation because I can categorize the shards by their type (and thus avoid having index issues on large tables alltogether).
Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.
This does seem to be a bit overkill, but I would like to plan for the future as we can expect many more millions of rows to come. Does anything have any suggestions or insight into this?
Re: MyISAM slow COUNT() on large dataset
What does "(using the same mysql configuration)" mean? Because I do believe the default values in mysql configs are too low nowadays.
http://dev.mysql.com/doc/refman/5.1/en/ ... g-key.html
And what is a "size of shard"?
I couldn't understand that question. By using "shard" did you mean partitioning. E.g:Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.
http://dev.mysql.com/doc/refman/5.1/en/ ... g-key.html
And what is a "size of shard"?
There are 10 types of people in this world, those who understand binary and those who don't
Re: MyISAM slow COUNT() on large dataset
No, sharding is placing separate parts of the database on separate servers. This is an effective technique for very large databases, but you lose a lot of flexibility as JOINs become very expensive. This is more for scaling out, but you are just looking to increase performance.
I think you should heed vlad's advice and start with partioning, since you are only concerned with a specific large table.
Those techniques by the way won't help with counting unfurtunately, since you still need to access the entire table... what is generally done in those situations when performance is unacceptable is to use approx. count techniques that don't actually count all the rows.
I think you should heed vlad's advice and start with partioning, since you are only concerned with a specific large table.
Those techniques by the way won't help with counting unfurtunately, since you still need to access the entire table... what is generally done in those situations when performance is unacceptable is to use approx. count techniques that don't actually count all the rows.
Re: MyISAM slow COUNT() on large dataset
Oh! It will be a overkill IMHOpytrin wrote:No, sharding is placing separate parts of the database on separate servers.
There are 10 types of people in this world, those who understand binary and those who don't