Page 1 of 1

Updating data once per day

Posted: Mon Sep 24, 2007 9:20 am
by GeXus
I have a DB of about 10,000 products. I want to then display a random 50 of these products per day. So at midnight I will select rand 50 to display them. How would you suggest handling this so that it only selects a new random 50 in 24 hours? or at midnight.

Here are a couple options.

1. Select 50, store in memory (memcache) until 12:00 and then select a new 50 and store again, etc.

2. Create a new table that gets the selected 50 inserted into it, then select from that 50. Have a job that runs and updates that DB at 12:00.


I think #1 would be the best bet, but I'm just wondering what you guys would suggest.

Thanks!

Posted: Mon Sep 24, 2007 9:32 am
by feyd
Option 2 is often more portable.

Posted: Mon Sep 24, 2007 9:33 am
by GeXus
feyd wrote:Option 2 is often more portable.
More portable or scalable? Not sure I understand.

Posted: Mon Sep 24, 2007 9:35 am
by GeXus
Just want to add... This is for a relatively high-traffic site, approx. 5mill+ page views a day, so one of my concerns about using the DB, is obviously load and speed.

Posted: Mon Sep 24, 2007 10:02 am
by Zoxive
GeXus wrote:Just want to add... This is for a relatively high-traffic site, approx. 5mill+ page views a day, so one of my concerns about using the DB, is obviously load and speed.
I too would put it in a separate table, then you can even go back in history (Previous Days etc).
You could use a cron job to do so.

5 Million? May i ask what the site is?
If your having that much traffic, then this already shouldn't be an ordinary server.

Posted: Mon Sep 24, 2007 10:10 am
by GeXus
Zoxive wrote:
GeXus wrote:Just want to add... This is for a relatively high-traffic site, approx. 5mill+ page views a day, so one of my concerns about using the DB, is obviously load and speed.
I too would put it in a separate table, then you can even go back in history (Previous Days etc).
You could use a cron job to do so.

5 Million? May i ask what the site is?
If your having that much traffic, then this already shouldn't be an ordinary server.
Hmmm.... So you don't think having a separate table would be unnecessary load? I don't have much of a need for history... although I guess I could use the history to check on when pulling randoms to make sure that it doesn't pull too many of the same again... but not too likely...

I really can't say what site it is.... sorry ;-)

Posted: Mon Sep 24, 2007 10:13 am
by feyd
GeXus wrote:More portable or scalable? Not sure I understand.
Both actually.

Posted: Mon Sep 24, 2007 10:19 am
by GeXus
Ok.... so database is the way. Now, should I just use it as a lookup? Then I will have to join on the "products" table... which is fine... but again, just thinking about performance here...


As far as scalability, with memcache, it's really easy to just toss another cache server into the mix... easier than adding to a db cluster..

Posted: Mon Sep 24, 2007 1:51 pm
by califdon
Disclaimer: I have no experience with really high activity systems. That said, I too support creating a new table each day. If the cron job is not an easy solution, you could check the date of creation of the table every time and if it is different from the current date, create a new random table. That would add to the execution time of each query, but only slightly for 99.99% of the queries, with just one query a day having to wait to create the new table. Just a thought.

Posted: Mon Sep 24, 2007 8:12 pm
by feyd
Whoa, whoa, whoa.. new table each day? I don't think that's a good idea in the slightest of ways.

Posted: Fri Sep 28, 2007 6:08 pm
by califdon
feyd wrote:Whoa, whoa, whoa.. new table each day? I don't think that's a good idea in the slightest of ways.
Ordinarily I would agree with you, feyd, but this seems to me to be an unusual situation. He's talking about a fairly large table and a very high activity. Since he wants to choose just 50 rows to use for each calendar day, I think I'd rather have the millions of queries a day access 50 rows instead of 10,000 rows. On a given day, he knows that he does not want to access 9,950 of those rows.

Posted: Fri Sep 28, 2007 7:04 pm
by John Cartwright
Are we talking about truncating the temporary table and repopulating with random values or creating an entirely new table each day?