Many to Many Related Data - 'Most Similar To'

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Many to Many Related Data - 'Most Similar To'

Post by onion2k »

I have a basic many to many cross reference table linking two other tables. They're very straight forward:

Code: Select all

|---------------|
| image         |
|---------------|
| image_id      |
| title         |
| etc           |
|---------------|
 
|---------------|
| tag           |
|---------------|
| tag_id        |
| tag           |
| etc           |
|---------------|
 
|---------------|
| image_tag_xref|
|---------------|
| image_id      |
| tag_id        |
|---------------|
Getting all the images for any tag is simple, likewise getting all the tags for any image, and any combination of tags and images. I'm fine with all that.

However... I want to try and get images that are similar to an image. By similar I mean ones that share the same tags. So if an image was tagged with tags 1,2,3,4,5,6 the query would fetch all the images that have some subset of those tags, and order them by the number of tags that match. So an image that was tagged with 1,2,3,4,6 might come first, followed by one tagged with 2,3,4,5, followed by one tagged with just 4.

Every approach I've thought of so far doesn't seem to work. In my head it seems like something there should be an elegant solution to but I can't find it. Any ideas?
User avatar
andyhoneycutt
Forum Contributor
Posts: 468
Joined: Wed Aug 27, 2008 10:02 am
Location: Idaho Falls

Re: Many to Many Related Data - 'Most Similar To'

Post by andyhoneycutt »

onion2k- my whiteboard is covered in this logic at work right now. I spent a couple of hours on your problem and damn if I don't feel like I'm at least close to the opening logic to this solution.

Is it possible for you to use server-side scripting to help this logic along, or does it all need to be one big beautiful sql statement? =]

-Andy
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Many to Many Related Data - 'Most Similar To'

Post by onion2k »

Any solution is ok really, but it's something that feels like it should be possible in pure SQL so that's the approach I'm taking at the moment.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Many to Many Related Data - 'Most Similar To'

Post by VladSun »

[sql]SELECT     image.id,    count(image.id) AS hitsFROM     imageINNER JOIN      tag_image ON tag_image.FK_image_id = image.idINNER JOIN     tag ON tag_image.FK_tag_id = tag.idWHERE     tag.id IN      (           SELECT                  tag_image.FK_tag_id            FROM                tag_image            WHERE                 tag_image.FK_image_id = @base_image_id     )GROUP BY     image.idORDER BY      hits DESC[/sql]

Hope that would help.
There are 10 types of people in this world, those who understand binary and those who don't
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Many to Many Related Data - 'Most Similar To'

Post by onion2k »

Ahh.. it seems so obvious when someone gives you the solution. Cheers mate.
User avatar
VladSun
DevNet Master
Posts: 4313
Joined: Wed Jun 27, 2007 9:44 am
Location: Sofia, Bulgaria

Re: Many to Many Related Data - 'Most Similar To'

Post by VladSun »

I should appologize for my stupidity :)
I believe that:
[sql]SELECT     image.id,    count(image.id) AS hitsFROM     imageINNER JOIN      tag_image ON           tag_image.FK_image_id = image.idINNER JOIN     tag_image AS base_tag_image ON           base_tag_image.FK_tag_id = tag_image.FK_tag_id           AND           base_tag_image.FK_image_id=@base_image_idORDER BY      hits DESC[/sql]
is far, far, far faster (about 4'000 times on 200'000+ many-to-many records ;) ).
There are 10 types of people in this world, those who understand binary and those who don't
Post Reply