Assigning tags ("keywords") to items

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Assigning tags ("keywords") to items

Post by GeXus »

I want to add the ability of assigning user-inputed tags to items, such as Music Video is assigned the tags "music, video, rock, band"

How would you suggest storing this data? I think naturally you would just create another table called tags which would look like

item_id
tag_name


Then when you search you would do

Code: Select all

select items.item from items inner join tags on items.id = tags.item_id where tag_name.tags = '$tag' 
This would mean that you would have a very long but narrow table... Does that sound like the most efficient way?
User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Post by Chalks »

I would make the table:

item_id
tag_name
tag_weight

then, when someone inputs a tag, if the tag already exists, increment the tag weight by 1. That way you can make a neato looking tag cloud... and you don't get duplicate tag entries.

edit: That is... now you won't have an insanely long table, and you still get to keep track of which tags are more popular than others.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

Either way seems fine.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
GeXus
Forum Regular
Posts: 631
Joined: Sat Mar 11, 2006 8:59 am

Post by GeXus »

Chalks wrote:I would make the table:

item_id
tag_name
tag_weight

then, when someone inputs a tag, if the tag already exists, increment the tag weight by 1. That way you can make a neato looking tag cloud... and you don't get duplicate tag entries.

edit: That is... now you won't have an insanely long table, and you still get to keep track of which tags are more popular than others.
You would still have duplicates though? How would you associate the tag to an item_id if you incremented the value to a different item_id instead of adding a record... and for the tag cloud, couldn't you just group by a count of tags?
User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Post by Chalks »

GeXus wrote:You would still have duplicates though? How would you associate the tag to an item_id if you incremented the value to a different item_id instead of adding a record... and for the tag cloud, couldn't you just group by a count of tags?
You would _not_ still have duplicates, and yes.

when inserting tag: check if tag already exists. If yes, then increment that tag weight by 1, if not then insert new tag into database.

that may cause one extra query for insertion, but you don't have to count through a table that could potentially have hundreds, thousands, or BILLIONS of records.









Ok, maybe not billions.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

That's a race condition. Be careful here.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

If you're worried about tag weight, you could just insert the tags as you described in your original post, then to display the tag cloud

Code: Select all

SELECT count(*) AS `count`, `tag` FROM `tags` GROUP BY `tag` ORDER BY `count` DESC
Something to that effect. Probably not the correct query, but definitely doable.
Last edited by s.dot on Fri Dec 21, 2007 4:53 pm, edited 1 time in total.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
Chalks
Forum Contributor
Posts: 447
Joined: Thu Jul 12, 2007 7:55 am
Location: Indiana

Post by Chalks »

feyd wrote:That's a race condition. Be careful here.
What is a "race condition"?

... The longer I hang out in these forums, the more questions I come up with >.>
User avatar
califdon
Jack of Zircons
Posts: 4484
Joined: Thu Nov 09, 2006 8:30 pm
Location: California, USA

Post by califdon »

chalks wrote:that may cause one extra query for insertion, but you don't have to count through a table that could potentially have hundreds, thousands, or BILLIONS of records.


Ok, maybe not billions.
YEAH! If my mother told me once, she told me a billion times: "Don't exaggerate!"
Last edited by califdon on Fri Dec 21, 2007 4:53 pm, edited 2 times in total.
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Post by s.dot »

http://en.wikipedia.org/wiki/Race_condition#Computing

Basically two things are happening at very close (if not the same) times. Which one finishes first (and consequently last) would determine the output of the script, function, etc. It's not a desirable situation to be in as programmers (at least in php) like things to happen in a predictable linear fashion.
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Post by onion2k »

My site has tagging (although I call them labels). I did it with 3 tables... Article -> article_label_xref -> Label. It is a many to many relationship after all. A label is added to the Label table as a unique string with an autoincremented id, then any article it's used with gets a record in the cross reference table (using article_id, label_id). Counting them so you can build a cloud is pretty easy...

Code: Select all

SELECT `ooer_label`.`label`, 
COUNT(`ooer_article_label_xref`.`label_id`) AS total 
FROM `ooer_label` 
INNER JOIN `ooer_article_label_xref` ON `ooer_label`.`label_id` = `ooer_article_label_xref`.`label_id` 
LEFT JOIN `ooer_article` ON `ooer_article.`article_id` = `ooer_article_label_xref`.`article_id` 
WHERE 1 
AND `ooer_article`.`deleted` = 0 
GROUP BY `ooer_label`.`label` 
ORDER BY total DESC 
LIMIT 20
The LEFT JOIN and WHERE stuff isn't really necessary, it just makes sure tags that have been used on deleted articles aren't counted. If you were to remove the tags from the cross reference when something's deleted you could get rid of those bits.
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

A "has and belongs to many" relationship typically has a join table. Don't worry that it's super long, DB's are cool with that. Besides, the join table is only storing 2 integer columns, which is trivial.

I'd let MySQL do the heavy lifting for the tag cloud (with caching, of course)
User avatar
JellyFish
DevNet Resident
Posts: 1361
Joined: Tue Feb 14, 2006 7:18 pm
Location: San Diego, CA

Post by JellyFish »

I was going to do a tag system in the same table as what's being tagged. But how would I select all rows where the keywords/tag field has the searched-for-tag(s).

Table would look something like:
foo
bar
tags
tags would have a string like: "this,and,that,foo,bar".

But then how would I be able to select foo and bar where tags has $_GET['tags']?
User avatar
Kieran Huggins
DevNet Master
Posts: 3635
Joined: Wed Dec 06, 2006 4:14 pm
Location: Toronto, Canada
Contact:

Post by Kieran Huggins »

That smells of the wrong way to me, hence why it seems so difficult. A join table really is much simpler, and will perform much better for you.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Post by John Cartwright »

Like your other thread, normalization is the way to go. Your just going to make things harder for you in the end and more than likely require more effort going down this path.
Post Reply