Page 1 of 1
Maintaining Speed in large databases
Posted: Wed Mar 08, 2006 12:15 am
by Benjamin
I am developing a database that will store a large amount of information for hundreds of thousands, if not millions of users and I have several questions regarding speed.
1. Is there any speed difference between an unsigned int 1 and a char 1?
2. Is it better to separate the database into separate databases? For example there is 1 table with about 70 unsigned int 1 fields that will be used for search queries and it needs to be fast. Can I force MySQL to store this table in ram? Would it make any difference if this was in a separate database or not?
3. Do I have to index every field in that table to ensure that it is fast?
Thank you.
Re: Maintaining Speed in large databases
Posted: Wed Mar 08, 2006 1:18 am
by feyd
agtlewis wrote:1. Is there any speed difference between an unsigned int 1 and a char 1?
There can be, in my experience, but it can easily come down to build settings, the database's construction, operating system, and so forth. Complicated.
agtlewis wrote:2. Is it better to separate the database into separate databases? For example there is 1 table with about 70 unsigned int 1 fields that will be used for search queries and it needs to be fast. Can I force MySQL to store this table in ram? Would it make any difference if this was in a separate database or not?
I don't think separate databases would help. Study the tuning tips the MySQL group has put out on their pages. One storage engine is HEAP (in memory).
agtlewis wrote:3. Do I have to index every field in that table to ensure that it is fast?
No, this can actually slow things down. Again, study the tuning guides in the documentation.
The biggest thing I can recommend for a project this large, a separate, very powerful, database server. I could easily go as far as suggesting a cluster. Other suggestions would be running a minimal overhead OS i.e. *nix variants, quite possibly 64 bit processors, lots of RAM, and fast well tuned hard drives.
Posted: Wed Mar 08, 2006 1:52 am
by josh
Isn't mysql in the process of implementing partitions? I'm pretty sure its available in the latests beta code or what not.. one type was a range partition, users 1-100,000 are in one partition 100,001 - 200,000 in the next.. etc.. When you ask for record N it first checks which range it is in using a series of <= >= comparisons, once it finds out the partition it scans only those rows (greatly optimized speed). While the heap table persists in memory, its data has very little permanence (the table starts out empty, and if you restart mysql all data is lost). There is a way to have it load from a static table at startup but then you have to keep the static table up to date with writes that are happening in the HEAP. Basically the only use for it that I see is for storing sessions or other temporary data.
Posted: Wed Mar 08, 2006 8:33 pm
by Benjamin
Ok, I don't know if I am in the right place or not. I read parts of the manual detailing Optimization. I didn't pull too much useful information out of it.
http://dev.mysql.com/doc/refman/5.0/en/ ... ff=nopfpls
Try to avoid complex SELECT queries on MyISAM tables that are updated frequently, to avoid problems with table locking that occur due to contention between readers and writers.
I am going to be using complex SELECT queries on a table that is going to be updated pretty often. Should I use delayed inserts for that table? If so how long is the delay?
Should I use the RieserFS filesystem? Is there anything else I can read online that would help me? Is the MyISAM table type suitable for this?
Posted: Wed Mar 08, 2006 9:07 pm
by josh
agtlewis wrote:Should I use delayed inserts for that table? If so how long is the delay?
Delayed queries depend on the context, I use them for:
updating post counts on forums
updating the last login time
Updating the total number of views on an image in a photo-gallery
logging the page view for a database driven stats program
I don't use them for:
updating passwords
writing session data
etc..
Generally if the data being delayed past another page view by the same user is going to be a problem (for example writing session data that the user us logged in, but the user requesting another page before that session data is even finally written), then I would not use delayed.
Another option is low_priority and high_priority, these tell mysql where to place items in the queue, high_priority queries jump the queue, while low_priority queries are pushed back in the queue as new queries come in. You can combine delayed and low_priority where appropriate. Something like selecting content to build a page is a candidate for a high_priority, while something like updating the content in an admin panel is typical of a query deserving of low_priority.
The actual amount of delay based on all of this depends on the amount of traffic in mysql's queue. If you're dealing with 5-6 queries a second a delayed query will execute rather quickly usually, but then again if your server is being overloaded with queries your delayed query could hang around for a while before being executed.
Posted: Wed Mar 08, 2006 9:23 pm
by Benjamin
Ok thank you for explaining that. I figured it worked that way but I didn't know for sure.