Algorithm design
Moderator: General Moderators
Algorithm design
I'm trying to design an algorithm but I don't know where to start. I'm utilizing Google's Social Graph API and I'm trying to have my website run an algorithm that chooses the best person's goal per day. I need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship). I need to add a predefined table of keywords to mu database and I need to be able to query the images from the database for a specfic keyword. Hey would I go about starting to create this and to bring it to scale where the algorithm picks a pictures for the site every 24 hours?
- Christopher
- Site Administrator
- Posts: 13596
- Joined: Wed Aug 25, 2004 7:54 pm
- Location: New York, NY, US
Re: Algorithm design
Trying to figure out what you want:
1a. choose the best person's goal per day
1b. pick a pictures for the site every 24 hours
2. need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship).
3. need to add a predefined table of keywords to mu database
4. need to be able to query the images from the database for a specfic keyword.
Questions:
#3 What is the "mu" database?
#2 What does this mean?
1a. choose the best person's goal per day
1b. pick a pictures for the site every 24 hours
2. need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship).
3. need to add a predefined table of keywords to mu database
4. need to be able to query the images from the database for a specfic keyword.
Questions:
#3 What is the "mu" database?
#2 What does this mean?
(#10850)
Re: Algorithm design
I think he meant "my," it's a probably a typo.Christopher wrote: #3 What is the "mu" database?
Re: Algorithm design
Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows. Use LIMIT 1 to only grab the first row of the set. So "ORDER BY RAND() LIMIT 1" grabs 1 random row. Do that every 24hrs. This is hardly an 'algorithm' though in my opinion, although that is a topic for philosophy, not a programming related discussion.kclark wrote: Hey would I go about starting to create this and to bring it to scale where the algorithm picks a pictures for the site every 24 hours?
There's a trick to not use cron scripts. Store a timestamp along with the current image ID. Everytime you query the image ID, if the timestamp > 24hrs ago a new image ID is picked during that page load.
Re: Algorithm design
Kind of offtopic, but this is a performance killer. If you have large table, you need to scan the entire table to sort it. (and you'll need a filesort - no way you can use index for ordering by rand)josh wrote: Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows.
A better way to do, if you have idea how many rows exists, is to use limit 1 offset $randomNumber.
In this way you will have only to scan $randomNumber rows.
if there are not (a lot of) deleted rows, you can use a random number for direct ID / range lookup. And you can get the max ID fast with a query order by ID DESC LIMIT 1
Re: Algorithm design
By the way, I'm using same practice and just curios, does this really hurt performance so?Darhazer wrote:Kind of offtopic, but this is a performance killer. If you have large table, you need to scan the entire table to sort it. (and you'll need a filesort - no way you can use index for ordering by rand)josh wrote: Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows.
A better way to do, if you have idea how many rows exists, is to use limit 1 offset $randomNumber.
In this way you will have only to scan $randomNumber rows.
if there are not (a lot of) deleted rows, you can use a random number for direct ID / range lookup. And you can get the max ID fast with a query order by ID DESC LIMIT 1
Re: Algorithm design
Performance is subjective. Yes its slower than other ways of grabbing a random row. Whether it hurts performance or not depends on whether it hurts performance or not.
Once every 24hrs at 4am on a 1 Million row table probably will not hurt performance as much as doing it every page load on a 1 Million row table.
Once every 24hrs at 4am on a 1 Million row table probably will not hurt performance as much as doing it every page load on a 1 Million row table.