Page 1 of 1

Alternative search word (like Google)?

Posted: Tue Aug 03, 2004 3:46 pm
by visionmaster
Hello,

Entering a search word in google, e.g. "meschinenbau" less than 10 results are displayed, but "Did you mean: maschinenbau" is displayed in an addition.

How can that be realized? What concept is behind the idea?
Remember all search words the users have entered in a growing database table?

If the user enters "meschinenbau" and there are <= 10 hits, then search in my search table for the most similiar word (soundex or levenstein). Since ofcourse my list can have search words which also have a low result rate, how do I avoid that? When writing a search word into my table, remember the hits for each word. So the word similarity is not only of relevance, but also die number of results found for this specific search word.

What do you think? How would you solve it? Any suggestions?

Thanks!

Posted: Tue Aug 03, 2004 11:44 pm
by kettle_drum
Well i think google has built up the knowledge of what people really ment to search for over a few years but im sure you could do something similar by:

1) Store a list of words that people often mis-spell with the correct word next to them and then show the correct word.
2) Store the position of keys on a standard qwerty keyboard in an array or something to say that 't' could have a possibly also been 'rfghy65' as the user may have just been one key off.

Posted: Thu Aug 05, 2004 8:39 am
by JAM
Mysql wrote:SOUNDEX(str)
Returns a soundex string from str. Two strings that sound almost the same should have identical soundex strings. A standard soundex string is four characters long, but the SOUNDEX() function returns an arbitrarily long string. You can use SUBSTRING() on the result to get a standard soundex string. All non-alphabetic characters are ignored in the given string. All international alphabetic characters outside the A-Z range are treated as vowels.

mysql> SELECT SOUNDEX('Hello');
-> 'H400'
mysql> SELECT SOUNDEX('Quadratically');
-> 'Q36324'

Note: This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and then duplicates, whereas the enhanced version discards duplicates first and then vowels.
expr1 SOUNDS LIKE expr2
This is the same as SOUNDEX(expr1) = SOUNDEX(expr2). It is available only in MySQL 4.1 or later.
This might be interesting. If checking for something that "sounds like" using keywords as kettle_drum mention (wordfile or straight from the database itself using full-text search functions) you might be able to find a solution.