Page 2 of 2

Re: MyISAM slow COUNT() on large dataset

Posted: Fri Jan 09, 2009 7:21 pm
by Eran
Am I just being unrealistic if I want to calculate such a simple number dynamically? Would it just be more advisable to simply use a summary table?
I think the COUNT is not the problem here, but fetching the type column for 3million rows. If it was just counting, it would finish much faster.

Re: MyISAM slow COUNT() on large dataset

Posted: Sat Jan 10, 2009 4:12 am
by jaoudestudios
Jcart wrote:

Code: Select all

max_heap_size 16777216
sort_buffer_size = 2097144
and ended up adding a 0 to each of their values. After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.
Wow so these two changes to the mysql server setup had a massive effect?!?!

Re: MyISAM slow COUNT() on large dataset

Posted: Sat Jan 10, 2009 6:19 am
by VladSun
Try increasing the key_buffer_size also:
http://www.databasejournal.com/features ... iables.htm

In the binary installation MYSQL packet there are some config samples:
my-huge.cnf my-large.cnf my-medium.cnf my-small.cnf

It could be a good idea to see how they differ from your config.
Also keep in mind that if you increase the memory used by MySQL too much, the OS can start swapping its used memory - that will be a very bad scenario ;)

Re: MyISAM slow COUNT() on large dataset

Posted: Sat Jan 10, 2009 6:36 am
by VladSun
Query with type field and group by 55.8039 sec
The statement below gives a little bit better performance. The queries returned are between 15-20 seconds.
After doing so with max_heap_size the query began taking around 4 seconds, and with sort_buffer_size its consistently in the 3 seconds zone. Still not exactly what I consider efficient.
It's 18.6 times faster :)
You should be happy, shouldn't you ;)

Re: MyISAM slow COUNT() on large dataset

Posted: Sat Jan 10, 2009 8:40 am
by VladSun
Jcart wrote:Can anyone recommend how to determine what the ideal setting would be? I don't want to screw around with things too much without knowing the side effects.
I've found these:

http://www.linuxweblog.com/tune-my.cnf
http://www.debianhelp.co.uk/mysqlperformance.htm

Re: MyISAM slow COUNT() on large dataset

Posted: Tue Jan 13, 2009 5:46 pm
by John Cartwright
Thanks for all the help guys. I realy appreciate itn

We ended up upgrading our server to a beast of a machine and the query now takes only 2 seconds (using the same mysql configuration). I did end up reducing the database back to 3 million rows, so that may have helped also.

I picked up a copy of High Performance Mysql 2nd Edition, and it did propose the usage of sharding in such situations. This seems like an optimal situation because I can categorize the shards by their type (and thus avoid having index issues on large tables alltogether).

Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.

This does seem to be a bit overkill, but I would like to plan for the future as we can expect many more millions of rows to come. Does anything have any suggestions or insight into this?

Re: MyISAM slow COUNT() on large dataset

Posted: Wed Jan 14, 2009 3:36 am
by VladSun
What does "(using the same mysql configuration)" mean? Because I do believe the default values in mysql configs are too low nowadays.
Now my question is, what is a good size of a shard? Is there a general rule to judge such a thing, or is it entirely dependant on the servers power.
I couldn't understand that question. By using "shard" did you mean partitioning. E.g:
http://dev.mysql.com/doc/refman/5.1/en/ ... g-key.html

And what is a "size of shard"?

Re: MyISAM slow COUNT() on large dataset

Posted: Wed Jan 14, 2009 3:39 am
by Eran
No, sharding is placing separate parts of the database on separate servers. This is an effective technique for very large databases, but you lose a lot of flexibility as JOINs become very expensive. This is more for scaling out, but you are just looking to increase performance.
I think you should heed vlad's advice and start with partioning, since you are only concerned with a specific large table.

Those techniques by the way won't help with counting unfurtunately, since you still need to access the entire table... what is generally done in those situations when performance is unacceptable is to use approx. count techniques that don't actually count all the rows.

Re: MyISAM slow COUNT() on large dataset

Posted: Wed Jan 14, 2009 3:45 am
by VladSun
pytrin wrote:No, sharding is placing separate parts of the database on separate servers.
Oh! It will be a overkill IMHO :)