Page 1 of 1

Many to Many Related Data - 'Most Similar To'

Posted: Wed Sep 03, 2008 4:48 pm
by onion2k
I have a basic many to many cross reference table linking two other tables. They're very straight forward:

Code: Select all

|---------------|
| image         |
|---------------|
| image_id      |
| title         |
| etc           |
|---------------|
 
|---------------|
| tag           |
|---------------|
| tag_id        |
| tag           |
| etc           |
|---------------|
 
|---------------|
| image_tag_xref|
|---------------|
| image_id      |
| tag_id        |
|---------------|
Getting all the images for any tag is simple, likewise getting all the tags for any image, and any combination of tags and images. I'm fine with all that.

However... I want to try and get images that are similar to an image. By similar I mean ones that share the same tags. So if an image was tagged with tags 1,2,3,4,5,6 the query would fetch all the images that have some subset of those tags, and order them by the number of tags that match. So an image that was tagged with 1,2,3,4,6 might come first, followed by one tagged with 2,3,4,5, followed by one tagged with just 4.

Every approach I've thought of so far doesn't seem to work. In my head it seems like something there should be an elegant solution to but I can't find it. Any ideas?

Re: Many to Many Related Data - 'Most Similar To'

Posted: Wed Sep 03, 2008 11:37 pm
by andyhoneycutt
onion2k- my whiteboard is covered in this logic at work right now. I spent a couple of hours on your problem and damn if I don't feel like I'm at least close to the opening logic to this solution.

Is it possible for you to use server-side scripting to help this logic along, or does it all need to be one big beautiful sql statement? =]

-Andy

Re: Many to Many Related Data - 'Most Similar To'

Posted: Thu Sep 04, 2008 2:56 am
by onion2k
Any solution is ok really, but it's something that feels like it should be possible in pure SQL so that's the approach I'm taking at the moment.

Re: Many to Many Related Data - 'Most Similar To'

Posted: Thu Sep 04, 2008 9:15 am
by VladSun
[sql]SELECT     image.id,    count(image.id) AS hitsFROM     imageINNER JOIN      tag_image ON tag_image.FK_image_id = image.idINNER JOIN     tag ON tag_image.FK_tag_id = tag.idWHERE     tag.id IN      (           SELECT                  tag_image.FK_tag_id            FROM                tag_image            WHERE                 tag_image.FK_image_id = @base_image_id     )GROUP BY     image.idORDER BY      hits DESC[/sql]

Hope that would help.

Re: Many to Many Related Data - 'Most Similar To'

Posted: Thu Sep 04, 2008 1:59 pm
by onion2k
Ahh.. it seems so obvious when someone gives you the solution. Cheers mate.

Re: Many to Many Related Data - 'Most Similar To'

Posted: Thu Sep 18, 2008 1:36 pm
by VladSun
I should appologize for my stupidity :)
I believe that:
[sql]SELECT     image.id,    count(image.id) AS hitsFROM     imageINNER JOIN      tag_image ON           tag_image.FK_image_id = image.idINNER JOIN     tag_image AS base_tag_image ON           base_tag_image.FK_tag_id = tag_image.FK_tag_id           AND           base_tag_image.FK_image_id=@base_image_idORDER BY      hits DESC[/sql]
is far, far, far faster (about 4'000 times on 200'000+ many-to-many records ;) ).