Algorithm design

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
kclark
Forum Newbie
Posts: 14
Joined: Tue Jan 04, 2011 3:58 pm

Algorithm design

Post by kclark »

I'm trying to design an algorithm but I don't know where to start. I'm utilizing Google's Social Graph API and I'm trying to have my website run an algorithm that chooses the best person's goal per day. I need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship). I need to add a predefined table of keywords to mu database and I need to be able to query the images from the database for a specfic keyword. Hey would I go about starting to create this and to bring it to scale where the algorithm picks a pictures for the site every 24 hours?
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Algorithm design

Post by Christopher »

Trying to figure out what you want:

1a. choose the best person's goal per day
1b. pick a pictures for the site every 24 hours

2. need to create a database that states when the user uploads the image that image is already defined by keywords (many to many relationship).

3. need to add a predefined table of keywords to mu database

4. need to be able to query the images from the database for a specfic keyword.

Questions:

#3 What is the "mu" database?

#2 What does this mean?
(#10850)
s992
Forum Contributor
Posts: 124
Joined: Wed Oct 27, 2010 3:06 pm

Re: Algorithm design

Post by s992 »

Christopher wrote: #3 What is the "mu" database?
I think he meant "my," it's a probably a typo.
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Algorithm design

Post by josh »

kclark wrote: Hey would I go about starting to create this and to bring it to scale where the algorithm picks a pictures for the site every 24 hours?
Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows. Use LIMIT 1 to only grab the first row of the set. So "ORDER BY RAND() LIMIT 1" grabs 1 random row. Do that every 24hrs. This is hardly an 'algorithm' though in my opinion, although that is a topic for philosophy, not a programming related discussion.

There's a trick to not use cron scripts. Store a timestamp along with the current image ID. Everytime you query the image ID, if the timestamp > 24hrs ago a new image ID is picked during that page load.
User avatar
Darhazer
DevNet Resident
Posts: 1011
Joined: Thu May 14, 2009 3:00 pm
Location: HellCity, Bulgaria

Re: Algorithm design

Post by Darhazer »

josh wrote: Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows.
Kind of offtopic, but this is a performance killer. If you have large table, you need to scan the entire table to sort it. (and you'll need a filesort - no way you can use index for ordering by rand)
A better way to do, if you have idea how many rows exists, is to use limit 1 offset $randomNumber.
In this way you will have only to scan $randomNumber rows.
if there are not (a lot of) deleted rows, you can use a random number for direct ID / range lookup. And you can get the max ID fast with a query order by ID DESC LIMIT 1 :)
User avatar
Technical
Forum Commoner
Posts: 81
Joined: Thu Dec 02, 2010 5:30 am

Re: Algorithm design

Post by Technical »

Darhazer wrote:
josh wrote: Run a cron script every 24hrs, use ORDER BY RAND() at the end of your sql query to randomize all rows.
Kind of offtopic, but this is a performance killer. If you have large table, you need to scan the entire table to sort it. (and you'll need a filesort - no way you can use index for ordering by rand)
A better way to do, if you have idea how many rows exists, is to use limit 1 offset $randomNumber.
In this way you will have only to scan $randomNumber rows.
if there are not (a lot of) deleted rows, you can use a random number for direct ID / range lookup. And you can get the max ID fast with a query order by ID DESC LIMIT 1 :)
By the way, I'm using same practice and just curios, does this really hurt performance so?
josh
DevNet Master
Posts: 4872
Joined: Wed Feb 11, 2004 3:23 pm
Location: Palm beach, Florida

Re: Algorithm design

Post by josh »

Performance is subjective. Yes its slower than other ways of grabbing a random row. Whether it hurts performance or not depends on whether it hurts performance or not.

Once every 24hrs at 4am on a 1 Million row table probably will not hurt performance as much as doing it every page load on a 1 Million row table.
Post Reply