Page 1 of 1

Assigning tags ("keywords") to items

Posted: Fri Dec 21, 2007 2:19 pm
by GeXus
I want to add the ability of assigning user-inputed tags to items, such as Music Video is assigned the tags "music, video, rock, band"

How would you suggest storing this data? I think naturally you would just create another table called tags which would look like

item_id
tag_name


Then when you search you would do

Code: Select all

select items.item from items inner join tags on items.id = tags.item_id where tag_name.tags = '$tag' 
This would mean that you would have a very long but narrow table... Does that sound like the most efficient way?

Posted: Fri Dec 21, 2007 2:36 pm
by Chalks
I would make the table:

item_id
tag_name
tag_weight

then, when someone inputs a tag, if the tag already exists, increment the tag weight by 1. That way you can make a neato looking tag cloud... and you don't get duplicate tag entries.

edit: That is... now you won't have an insanely long table, and you still get to keep track of which tags are more popular than others.

Posted: Fri Dec 21, 2007 2:44 pm
by s.dot
Either way seems fine.

Posted: Fri Dec 21, 2007 2:51 pm
by GeXus
Chalks wrote:I would make the table:

item_id
tag_name
tag_weight

then, when someone inputs a tag, if the tag already exists, increment the tag weight by 1. That way you can make a neato looking tag cloud... and you don't get duplicate tag entries.

edit: That is... now you won't have an insanely long table, and you still get to keep track of which tags are more popular than others.
You would still have duplicates though? How would you associate the tag to an item_id if you incremented the value to a different item_id instead of adding a record... and for the tag cloud, couldn't you just group by a count of tags?

Posted: Fri Dec 21, 2007 3:21 pm
by Chalks
GeXus wrote:You would still have duplicates though? How would you associate the tag to an item_id if you incremented the value to a different item_id instead of adding a record... and for the tag cloud, couldn't you just group by a count of tags?
You would _not_ still have duplicates, and yes.

when inserting tag: check if tag already exists. If yes, then increment that tag weight by 1, if not then insert new tag into database.

that may cause one extra query for insertion, but you don't have to count through a table that could potentially have hundreds, thousands, or BILLIONS of records.









Ok, maybe not billions.

Posted: Fri Dec 21, 2007 3:55 pm
by feyd
That's a race condition. Be careful here.

Posted: Fri Dec 21, 2007 4:05 pm
by s.dot
If you're worried about tag weight, you could just insert the tags as you described in your original post, then to display the tag cloud

Code: Select all

SELECT count(*) AS `count`, `tag` FROM `tags` GROUP BY `tag` ORDER BY `count` DESC
Something to that effect. Probably not the correct query, but definitely doable.

Posted: Fri Dec 21, 2007 4:46 pm
by Chalks
feyd wrote:That's a race condition. Be careful here.
What is a "race condition"?

... The longer I hang out in these forums, the more questions I come up with >.>

Posted: Fri Dec 21, 2007 4:52 pm
by califdon
chalks wrote:that may cause one extra query for insertion, but you don't have to count through a table that could potentially have hundreds, thousands, or BILLIONS of records.


Ok, maybe not billions.
YEAH! If my mother told me once, she told me a billion times: "Don't exaggerate!"

Posted: Fri Dec 21, 2007 4:52 pm
by s.dot
http://en.wikipedia.org/wiki/Race_condition#Computing

Basically two things are happening at very close (if not the same) times. Which one finishes first (and consequently last) would determine the output of the script, function, etc. It's not a desirable situation to be in as programmers (at least in php) like things to happen in a predictable linear fashion.

Posted: Sat Dec 22, 2007 4:36 am
by onion2k
My site has tagging (although I call them labels). I did it with 3 tables... Article -> article_label_xref -> Label. It is a many to many relationship after all. A label is added to the Label table as a unique string with an autoincremented id, then any article it's used with gets a record in the cross reference table (using article_id, label_id). Counting them so you can build a cloud is pretty easy...

Code: Select all

SELECT `ooer_label`.`label`, 
COUNT(`ooer_article_label_xref`.`label_id`) AS total 
FROM `ooer_label` 
INNER JOIN `ooer_article_label_xref` ON `ooer_label`.`label_id` = `ooer_article_label_xref`.`label_id` 
LEFT JOIN `ooer_article` ON `ooer_article.`article_id` = `ooer_article_label_xref`.`article_id` 
WHERE 1 
AND `ooer_article`.`deleted` = 0 
GROUP BY `ooer_label`.`label` 
ORDER BY total DESC 
LIMIT 20
The LEFT JOIN and WHERE stuff isn't really necessary, it just makes sure tags that have been used on deleted articles aren't counted. If you were to remove the tags from the cross reference when something's deleted you could get rid of those bits.

Posted: Sun Dec 23, 2007 10:25 pm
by Kieran Huggins
A "has and belongs to many" relationship typically has a join table. Don't worry that it's super long, DB's are cool with that. Besides, the join table is only storing 2 integer columns, which is trivial.

I'd let MySQL do the heavy lifting for the tag cloud (with caching, of course)

Posted: Tue Dec 25, 2007 10:54 am
by JellyFish
I was going to do a tag system in the same table as what's being tagged. But how would I select all rows where the keywords/tag field has the searched-for-tag(s).

Table would look something like:
foo
bar
tags
tags would have a string like: "this,and,that,foo,bar".

But then how would I be able to select foo and bar where tags has $_GET['tags']?

Posted: Tue Dec 25, 2007 9:42 pm
by Kieran Huggins
That smells of the wrong way to me, hence why it seems so difficult. A join table really is much simpler, and will perform much better for you.

Posted: Tue Dec 25, 2007 10:32 pm
by John Cartwright
Like your other thread, normalization is the way to go. Your just going to make things harder for you in the end and more than likely require more effort going down this path.