Mysql Limitations

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Mysql Limitations

Post by John Cartwright »

I'm faced with the same situation as I did several months ago, where I am tracking views and clicks of banners and links for promotional campaigns.. all of which need very detailed tracking. Now, each link gets a tracking id which ties it into a much larger array of information (but that is not important here).

Last time I did this I ran very quickly into the default 4GB mysql limit and soon realized that mysql may not necessarily be my best tool here. Performance quickly became an issue, especially when performing queries that performed calculations on sometimes ~50 million rows. Obviously one cannot expect a 50 million row calculation to be pretty fast.. but I don't want to cripple the server at the same time. The queries involved joins about 2-3 other small tables, so the query itself isn't very big.

Lets assume I can use any hardware configuration.. realistically I have several servers at my disposal.

So my initial questions into this massive problem of mine..

1) What kind of hardware configurations are recommended for high volume database access
2) What database is best suited for extremely large volumes of data (hundreds of millions+)
3) What kind of impact does hundreds of inserts a second have on mysql?

Before someone goes about bashing me that this kind of data processing is not suitable for a web based language, seeking alternative languages is certainly an absolute last resort.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Mysql Limitations

Post by onion2k »

MySQL should be fine, but you will need to optimise the way you insert and subsequently access the data.

First and foremost - it's a lot faster to do lots of inserts all at once on a MyISAM table. I would recommend creating a second HEAP table (HEAP is only held in memory) and have the web application insert into that table - it'll be very fast indeed. Then, every N minutes, run a script to flush all the data from the HEAP table into the MyISAM table with:

Code: Select all

INSERT INTO `table_myisam` (`column1`,`column2`,`column3`) VALUES SELECT `column1`, `column2`, `column3` FROM `table_heap`;
Then clear the HEAP table with:

Code: Select all

TRUNCATE `table_heap`;
If the server crashes you'll lose the content in the HEAP table which is an unfortunate side effect - but that could be alleviated by using replication to a second database server perhaps.

Secondly, I wouldn't actually ever access the data in a 'live' situation. Run reports nightly when it's quiet.
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Mysql Limitations

Post by Christopher »

I think one question I would have is whether you need all of those logged rows once you have done your calculations. In the past I have done what webstats programs do, which is to reduce all the raw data records in a date range into subtotals and then do the actual stats on those. So you would run your queries on the raw data, then export it as a backup and delete those rows.
(#10850)
User avatar
liljester
Forum Contributor
Posts: 400
Joined: Tue May 20, 2003 4:49 pm

Re: Mysql Limitations

Post by liljester »

have you looked into creating a mysql cluster?
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Mysql Limitations

Post by Benjamin »

You may want to look into running select queries off slave servers to reduce the load. Memcache may help depending on how fresh certain data needs to be. Finally it may be good just to track all the data and run a nightly cron job to generate all the reports.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: Mysql Limitations

Post by John Cartwright »

onion2k wrote:MySQL should be fine, but you will need to optimise the way you insert and subsequently access the data.

First and foremost - it's a lot faster to do lots of inserts all at once on a MyISAM table. I would recommend creating a second HEAP table (HEAP is only held in memory) and have the web application insert into that table - it'll be very fast indeed. Then, every N minutes, run a script to flush all the data from the HEAP table into the MyISAM table with:

Code: Select all

INSERT INTO `table_myisam` (`column1`,`column2`,`column3`) VALUES SELECT `column1`, `column2`, `column3` FROM `table_heap`;
Then clear the HEAP table with:

Code: Select all

TRUNCATE `table_heap`;
If the server crashes you'll lose the content in the HEAP table which is an unfortunate side effect - but that could be alleviated by using replication to a second database server perhaps.
Excellent idea!
Secondly, I wouldn't actually ever access the data in a 'live' situation. Run reports nightly when it's quiet.
The best solution I can think of is grouping records with the same tracking id into hour slots, then simply updating that record + 1. The reason I can't go for an all out reporting solution is because there needs to be a lot of transparency of the data as you want to view more details.

I.e. initially you will see a list of sites, and see all the stats grouped by site,
the second level is grouped by affiliate per each site,
the third level is campaigns being run by that affiliate,
the fourth level will be the complete data breakdown the campaign

My boss said the lowest we can go is hourly builds, however I am still interested in reducing the amount of data being stored furthur.
arborint wrote:I think one question I would have is whether you need all of those logged rows once you have done your calculations. In the past I have done what webstats programs do, which is to reduce all the raw data records in a date range into subtotals and then do the actual stats on those. So you would run your queries on the raw data, then export it as a backup and delete those rows.
Indeed I am wanting to move away from raw data.. the last system we had was about 120 million rows of data (which was eventually corrupted and lost :banghead:).
have you looked into creating a mysql cluster?
Certainly :)
You may want to look into running select queries off slave servers to reduce the load. Memcache may help depending on how fresh certain data needs to be. Finally it may be good just to track all the data and run a nightly cron job to generate all the reports.
Would it be advisable to use memcache with large data sets? I think our servers have 16gb of memory at their disposal.
User avatar
Benjamin
Site Administrator
Posts: 6935
Joined: Sun May 19, 2002 10:24 pm

Re: Mysql Limitations

Post by Benjamin »

I believe memcache is typically used to cache the results returned by resource intensive or slow queries to reduce the load on MySQL servers.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Mysql Limitations

Post by Weirdan »

onion2k wrote: First and foremost - it's a lot faster to do lots of inserts all at once on a MyISAM table. I would recommend creating a second HEAP table (HEAP is only held in memory) and have the web application insert into that table - it'll be very fast indeed. Then, every N minutes, run a script to flush all the data from the HEAP table into the MyISAM
Similar effect could be achieved with INSERT DELAYED - this will pass the query to dedicated mysql thread, which will execute it when it will have enough resources.
The best solution I can think of is grouping records with the same tracking id into hour slots, then simply updating that record + 1.
This is good idea too - I used it four years ago and was able to record about 70 hits per second on pretty mediocre hardware. Using a storage engine that supports row level locks (InnoDB) also might help - otherwise long-running selects (reports, data transformation) will block updates on the table.
User avatar
liljester
Forum Contributor
Posts: 400
Joined: Tue May 20, 2003 4:49 pm

Re: Mysql Limitations

Post by liljester »

Im very curious to see which path you decide to take, let us know how it turns out!
Post Reply