Page 1 of 1

Algorithm design

Posted: Tue Jan 04, 2011 4:09 pm
by kclark
I'm trying to design an algorithm but I don't know where to start. I'm utilizing Google's Social Graph API and I'm trying to have my website run an algorithm that chooses the best person's goal per day. I need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship). I need to add a predefined table of keywords to mu database and I need to be able to query the images from the database for a specfic keyword. Hey would I go about starting to create this and to bring it to scale where the algorithm picks a pictures for the site every 24 hours?

Re: Algorithm design

Posted: Tue Jan 04, 2011 6:25 pm
by Christopher
Trying to figure out what you want:

1a. choose the best person's goal per day
1b. pick a pictures for the site every 24 hours

2. need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship).

3. need to add a predefined table of keywords to mu database

4. need to be able to query the images from the database for a specfic keyword.

Questions:

#3 What is the "mu" database?

#2 What does this mean?

Re: Algorithm design

Posted: Tue Jan 04, 2011 11:49 pm
by s992
Christopher wrote: #3 What is the "mu" database?
I think he meant "my," it's a probably a typo.

Re: Algorithm design

Posted: Wed Jan 05, 2011 2:46 am
by josh
kclark wrote: Hey would I go about starting to create this and to bring it to scale where the algorithm picks a pictures for the site every 24 hours?
Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows. Use LIMIT 1 to only grab the first row of the set. So "ORDER BY RAND() LIMIT 1" grabs 1 random row. Do that every 24hrs. This is hardly an 'algorithm' though in my opinion, although that is a topic for philosophy, not a programming related discussion.

There's a trick to not use cron scripts. Store a timestamp along with the current image ID. Everytime you query the image ID, if the timestamp > 24hrs ago a new image ID is picked during that page load.

Re: Algorithm design

Posted: Wed Jan 05, 2011 2:53 am
by Darhazer
josh wrote: Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows.
Kind of offtopic, but this is a performance killer. If you have large table, you need to scan the entire table to sort it. (and you'll need a filesort - no way you can use index for ordering by rand)
A better way to do, if you have idea how many rows exists, is to use limit 1 offset $randomNumber.
In this way you will have only to scan $randomNumber rows.
if there are not (a lot of) deleted rows, you can use a random number for direct ID / range lookup. And you can get the max ID fast with a query order by ID DESC LIMIT 1 :)

Re: Algorithm design

Posted: Wed Jan 05, 2011 1:45 pm
by Technical
Darhazer wrote:
josh wrote: Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows.
Kind of offtopic, but this is a performance killer. If you have large table, you need to scan the entire table to sort it. (and you'll need a filesort - no way you can use index for ordering by rand)
A better way to do, if you have idea how many rows exists, is to use limit 1 offset $randomNumber.
In this way you will have only to scan $randomNumber rows.
if there are not (a lot of) deleted rows, you can use a random number for direct ID / range lookup. And you can get the max ID fast with a query order by ID DESC LIMIT 1 :)
By the way, I'm using same practice and just curios, does this really hurt performance so?

Re: Algorithm design

Posted: Wed Jan 05, 2011 8:19 pm
by josh
Performance is subjective. Yes its slower than other ways of grabbing a random row. Whether it hurts performance or not depends on whether it hurts performance or not.

Once every 24hrs at 4am on a 1 Million row table probably will not hurt performance as much as doing it every page load on a 1 Million row table.