Hello.
I am doing some very basic research into creating a PHP feature that would make recommendations based on elements in a MySQL database. Basically, each element would have several characteristics, and the recommendation engine would take one element, compare its characteristics to those of all the other elements, and return a list of similar elements, kind of like Pandora what does with music or Amazon.com does with books.
I don't have a specific application yet--I'm just looking to learn to create this kind of feature. I know the technology is out there, but I can't find a good tutorial--can anyone direct me to a tutorial about creating this kind of thing, or an open-sourse script that does what I'm looking for (if there is one)?
Thanks
Tom
Recommendation Engine Question
Moderator: General Moderators
-
Tomcat7194
- Forum Commoner
- Posts: 48
- Joined: Mon Jul 31, 2006 1:34 pm
Most times these are key word indexes, many are hand built because a database no matter how smart it is can only make a selection based on a index or group of indexes. Like searching for CPU(s), you would not show AMD CPU(s) when someone is looking at a Intel motherboard. So indexes with relationships will give you what you want. If you don't want to start from scratch, then look at some search engine, catalog or auction type scripts to get a idea on how to do what you want to do. I myself would start from scratch because you will not find anything that will give good results that relates exactly to what you need to do. The reason I say that, is because most developers think that just using fulltext or boolean type indexing is enough, and it not, you need to create custom stop files for each index, so you don't return bogus results, which will happen if you rely on default database options, because this sort of thing has more to do with relevance grouping than just returning matching results based on a key word or a group of key words! So say you were looking for headphones, would you just show all the headphones you have, no, you would show the first page of headphones, but also return a list (relevance grouping) options so they can narrow the search (Sub-Categories: Manufacturer, New Items, Items with rebates, Connector Type, Wired/Wireless Type (bluetooth,RF,IR), Ear type (Circumaural, Supra-aural, Ear bud, Canal, Single Ear)). The more option you give the user makes more people use your service, because they can find things much faster.
printf
printf
-
Tomcat7194
- Forum Commoner
- Posts: 48
- Joined: Mon Jul 31, 2006 1:34 pm
Here's the one conceptual thing that I can't quite figure out, and which is preventing me from just doing this from scratch: I understand that I would have items in a database with keywords, and some kind of script that would connect relevant keywords. However, what I don't quite get is when that connection should occur.
When I add a new element to the database, should I have a script that compares its keywords to the keywords of other elements and generates some kind of "relevance index"? If so, how should I store that data? In an array which shows the relevance between that element and all the others (that would be a lot of data)? Also, would I have to keep rerunning that script for each element in order to prevent the relevance data from remaining static?
The other option would be to have a script run when a user did a search, and have that script load every element in the database, compare the user's keywords with the elements' keywords, and return the most relevant elements. That would solve the issue of making the system dynamic, but it would probably take forever, especially if there were thousands of entries in the database.
How are these kinds of things managed? Is it done server side (like my first example), and if so, how is the relevancy data stored and updated? Or, or is done client side, and if so, why doesn't it take forever?
Thanks
Tom
When I add a new element to the database, should I have a script that compares its keywords to the keywords of other elements and generates some kind of "relevance index"? If so, how should I store that data? In an array which shows the relevance between that element and all the others (that would be a lot of data)? Also, would I have to keep rerunning that script for each element in order to prevent the relevance data from remaining static?
The other option would be to have a script run when a user did a search, and have that script load every element in the database, compare the user's keywords with the elements' keywords, and return the most relevant elements. That would solve the issue of making the system dynamic, but it would probably take forever, especially if there were thousands of entries in the database.
How are these kinds of things managed? Is it done server side (like my first example), and if so, how is the relevancy data stored and updated? Or, or is done client side, and if so, why doesn't it take forever?
Thanks
Tom
Actually it' s done quite fast by the server side script. If you have MySQL 5 you can create views to store the data which you allow to search and it will be faster. Another 'speed trap' is the keyword LIKE it' s quite slow compared with the other and if you use it heavilly AND OR etc.
There a few tutorials on normalize databases, foreign keys and database design. Search and you will find.
There a few tutorials on normalize databases, foreign keys and database design. Search and you will find.