Page 1 of 1

mysql query for a collection of consecutive dates

Posted: Fri Dec 14, 2007 3:56 am
by mechamecha
Hello,
I'm looking for some ideas and suggestions to help solve my problem.

I am building a social network and would like to collect stats on each user. For example, every time a profile is viewed, I record into the db the day and a counter. As that user's profile gets more views for the same day, the counter will increment. However, if a user profile is never viewed in a single day, that day will never get recorded into the db. Therefore, this table will never contain an entry w/ a 0 counter.

I would like to display this data back to the user in a graph. However, when I query the table for a list of days and counts, the selection will be missing the days w/ 0 counts. My question is how can I fill these missing dates w/ 0 counts?

Here are a couple of solutions I've come up w/:

1. If a user never gets a profile view for a given day, then I could force an entry into the table for that day w/ a 0 count. THen my select query will never contain any gaps. The only draw back is the overhead in storing these rows w/ 0 counts...or not. Storage is pretty cheap these days.

2. I can compute the missing days in php. I would loop through the list of days and find where the gaps are and calculate the missing days and assign them 0 values. This approach requires converting dates to secs, perform the subtractions and then converting back to dates which can all get pretty messy quite easily...and tedious to implement.

3. I can create a 'calender' table. This is a table of all days and dates up to today. I fill it by running an agent that inserts the current day's date into into this table every day. Then when I query my stat table for the profile view count data, I perform a right join against this calender table. The resulting query should return a list of days w/o any gaps.

I like the 3rd solution the most [s]cuz[/s] because it appears to be the most elegant.

What do you guys think? Do you have any alternative solutions? Is there a select query that would not require these work arounds?

Thanks for taking the time to read this long post!
[url=http://forums.devnetwork.net/viewtopic.php?t=30037]Forum Rules[/url] Section 1.1 wrote:11. Please use proper, complete spelling when posting in the forums. AOL Speak, leet speak and other abbreviated wording can confuse those that are trying to help you (or those that you are trying to help). Please keep in mind that there are many people from many countries that use our forums to read, post and learn. They do not always speak English as well as some of us, nor do they know these aberrant abbreviations. Therefore, use as few abbreviations as possible, especially when using such simple words.

Some examples of what not to do are ne1, any1 (anyone); u (you); ur (your or you're); 2 (to too); prolly (probably); afaik (as far as I know); etc.

Posted: Fri Dec 14, 2007 11:12 am
by RobertGonzalez
You fill in the gaps programatically or you actually enter a 0 for dates without any hits.

Posted: Fri Dec 14, 2007 12:06 pm
by mechamecha
Everah,
Thanks for the reply.

Is there a reason you wouldn't advise me using the third solution? From my perspective, it appears to be more optimal and elegant of the other two. If I'm wrong, I would like to know why.

Programatically generating the dates can be a real pain because it would require various date time calculations(ie. taking into account how many days are in each month, leap years, etc..).

If I had to choose between the 1st and 2nd solution, I would rather store 0 hits and their dates into the db.

Thanks.

Posted: Fri Dec 14, 2007 12:57 pm
by RobertGonzalez
Why would you create a table and join on that when you could just add a single row/column of data and select that with the other data you are selecting? It would add almost no overhead, would keep your query from having to join and makes sense to be there since it is consistent with the data that is in there already.

Posted: Fri Dec 14, 2007 2:49 pm
by mechamecha
I guess what I really worry about is how the size of the table would impact the db's performance. You can imagine that a table that tracked stats for every user and every day they have been registered would be enormous. If you don't track rows w/ 0 hits, then the table could be smaller.

So is it much better to have a large table to query against than to perform a join on two tables? I am aware that joins can get expensive. Also, relying on two tables to represent data that can be in one table adds an unnecessary layer of complexity to the design. I'm just trying to grasp which approach is better in the long run.

Thanks for the help! I'm still new when it comes to schema design and planning.

Posted: Fri Dec 14, 2007 2:53 pm
by RobertGonzalez
Are you always going to be pulling every row for every user? Or are you going to be pulling a subset of data based on a criteria? Proper indexing and smart queries can alleviate a lot of database performance issues.

Posted: Fri Dec 14, 2007 3:48 pm
by mechamecha
no, i would be pulling subset of the data. I'd imagine you could grab a range of dates, say oct 3rd, 2007 to dec 15, 2007 and display the results in a graph. In this case, I would index the dates?

Posted: Fri Dec 14, 2007 4:02 pm
by RobertGonzalez
Index what is most often searched for. Since the table is going to contain rows of essentially a single integer value, and 0 is an integer value, it just makes sense to me to have a row that contains a 0.

Posted: Fri Dec 14, 2007 4:54 pm
by mechamecha
thanks everah. you've been really helpful!