Page 1 of 1
search algorythm
Posted: Wed May 18, 2011 9:06 am
by rhecker
I need to add a feature to search the mysql database behind a website. We do not want to use Google Search or something like that.
The problem is dealing with all the search term variables. For instance, keeping whatever from appearing in a search for 'hate' but allowing hates and instances where punctuation follows the term.
I'm thinking that someone must have written a class to deal with these variables. I have looked at a bunch of classes, tutorials and scripts on the web but so far I have not found what I'm looking for. Any ideas?
Re: search algorythm
Posted: Wed May 18, 2011 10:36 am
by pickle
If you set up using a MySQL FULLTEXT search and sort by relevance, matches to "hate" and "hates" will have a much higher relevance than "whatever".
You could also set up a Sphinx search server to handle all this for you. I've never used it, but it looks interesting.
http://sphinxsearch.com/
Re: search algorythm
Posted: Wed May 18, 2011 11:33 am
by rhecker
Yes, thanks. I've already started working with FULLTEXT and it is better in a number of ways. I looked a little at Lucene and Spinx but they seem like too much for what I am after, and there would be a much greater learning curve.
So I am experimenting with FULLTEXT and so far so good, although it seems to have some limitations. It's not clear to me if I can somehow use LIKE_ so that loves will come up in a search for love, that sort of thing.
Re: search algorythm
Posted: Wed May 18, 2011 12:52 pm
by califdon
There are all sorts of things you can do with LIKE, etc., but remember that natural language is complex. Do you want to return "gloves" when the search term is "love"? How about "beloved"? And then there's "loving", etc. As pickle said, if you order by relevance, then maybe filter for some minimum value of relevance, that may be the best you can do.
Re: search algorythm
Posted: Wed May 18, 2011 6:33 pm
by rhecker
The problem with using FULLTEXT, unless I am missing something (I hope I am) is that it is impossible to modify how the search terms will be processed.
For example:
search: peace and love
FULLTEXT will search for "peace" and "love" separately and ignore the "and" (which is fine).
But what if I want to seach for the combination of PEACE and LOVE? I don't seee a way to specify that.
Re: search algorythm
Posted: Wed May 18, 2011 9:07 pm
by rhecker
If the search term has multiple words, then it seems like preg_replace can be used to send the right definition to the mysql ful text query.
So if the search term is: "happy day" I would need it to become +happy +day
but
$term=preg_replace(" ", " +", $var);
does not produce this result.
Can someone tell me what would?
Re: search algorythm
Posted: Thu May 19, 2011 3:54 pm
by pickle
You can do that if you search in Boolean mode. Of course you will have to parse the search terms a bit, and you do lose automatic sorting by relevance, but that's not a big deal.
http://dev.mysql.com/doc/refman/5.1/en/ ... olean.html