"It becomes inefficient to log every hit in your application. A "rollup" is a table that aggregates statistics for performance reasons. "
critique
Moderator: General Moderators
Also example "reporting application" using Zend: http://code.google.com/p/socks/source/b ... Code/StatsThe stats framework itself can be found in library/PhpStats
See HTML version of documentation in library/PhpStats/documentation
URL 'http://socks.googlecode.com/svn/trunk' doesn't exist
It is based on the ideas of a star schema, right now it uses EAV so it can support varying numbers of "segments" with one schema. More like Oracle's cube rollup... I guess? http://www.psoug.org/reference/rollup.htmlpytrin wrote:To me it sounds like you're talking more about data-warehousing techniques, such as a star schema where some of the aggregated data is kept in a denormalized form in additional dimension tables. What I'm interested it, is how does your library help with such data aside from suggesting a more performant schema
How does this avoid pruning? wouldn't you like to still remove the original data from the normalized tables to prevent them from becoming too large?I know people who run large sites who just prune their data over time. This is the problem I am directly addressing. No one should ever have to prune their data.
I have been following this discussion, but have not looked closely. In reading what you said above, it almost sounds like a query cache that generates calculated, historic data (i.e. data that will not change) and then uses it for future queries that require the same calculated data. I don't think that is what you are doing, but the idea of generating the data on-demand and storing it in an EAV is interesting. It is something that could perhaps be transparent to the Domain Layer -- sitting below it, but above the actual data access.josh wrote:I guess technically its not a rollup as the Mysql/Oracle describe it. My solution differs in that once we've "rolled up" the stats for the Month of Jan, and the user asks to see traffic for all the year at the "grain" of 'month', it would not have a need to look into the log table, or even the hours table, or even the days table. Once it calculates the traffic for January it need not calculate it again (unless the month of January is not over).
I have in the past moved older data to an archive table (with the same schema as the "current" table) and then added a JOIN to the SQL if they requested a date range before the start date of the "current" table. That made most reports pretty fast, but still allowed reports on older data.josh wrote:I know people who run large sites who just prune their data over time. This is the problem I am directly addressing. No one should ever have to prune their data.
This technique is called partitioning (or sharding), and is now supported natively in recent versions of MySQL. You can declare the column to split the table by, the ranges for each partition, and MySQL will select the partition for you so you don't need to manually join.I have in the past moved older data to an archive table (with the same schema as the "current" table) and then added a JOIN to the SQL if they requested a date range before the start date of the "current" table. That made most reports pretty fast, but still allowed reports on older data.
Yes, it is pruning in the same way your brain prunes memories. You might remember a child-hood memory but you don't remember every detail about it.pytrin wrote:How does this avoid pruning? wouldn't you like to still remove the original data from the normalized tables to prevent them from becoming too large?
I would call it "poor man's sharding" and not quite sharding or partitioning as I understand the term.pytrin wrote:This technique is called partitioning (or sharding), and is now supported natively in recent versions of MySQL. You can declare the column to split the table by, the ranges for each partition, and MySQL will select the partition for you so you don't need to manually join.