Alternative search word (like Google)?

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
visionmaster
Forum Contributor
Posts: 139
Joined: Wed Jul 14, 2004 4:06 am

Alternative search word (like Google)?

Post by visionmaster »

Hello,

Entering a search word in google, e.g. "meschinenbau" less than 10 results are displayed, but "Did you mean: maschinenbau" is displayed in an addition.

How can that be realized? What concept is behind the idea?
Remember all search words the users have entered in a growing database table?

If the user enters "meschinenbau" and there are <= 10 hits, then search in my search table for the most similiar word (soundex or levenstein). Since ofcourse my list can have search words which also have a low result rate, how do I avoid that? When writing a search word into my table, remember the hits for each word. So the word similarity is not only of relevance, but also die number of results found for this specific search word.

What do you think? How would you solve it? Any suggestions?

Thanks!
kettle_drum
DevNet Resident
Posts: 1150
Joined: Sun Jul 20, 2003 9:25 pm
Location: West Yorkshire, England

Post by kettle_drum »

Well i think google has built up the knowledge of what people really ment to search for over a few years but im sure you could do something similar by:

1) Store a list of words that people often mis-spell with the correct word next to them and then show the correct word.
2) Store the position of keys on a standard qwerty keyboard in an array or something to say that 't' could have a possibly also been 'rfghy65' as the user may have just been one key off.
User avatar
JAM
DevNet Resident
Posts: 2101
Joined: Fri Aug 08, 2003 6:53 pm
Location: Sweden
Contact:

Post by JAM »

Mysql wrote:SOUNDEX(str)
Returns a soundex string from str. Two strings that sound almost the same should have identical soundex strings. A standard soundex string is four characters long, but the SOUNDEX() function returns an arbitrarily long string. You can use SUBSTRING() on the result to get a standard soundex string. All non-alphabetic characters are ignored in the given string. All international alphabetic characters outside the A-Z range are treated as vowels.

mysql> SELECT SOUNDEX('Hello');
-> 'H400'
mysql> SELECT SOUNDEX('Quadratically');
-> 'Q36324'

Note: This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and then duplicates, whereas the enhanced version discards duplicates first and then vowels.
expr1 SOUNDS LIKE expr2
This is the same as SOUNDEX(expr1) = SOUNDEX(expr2). It is available only in MySQL 4.1 or later.
This might be interesting. If checking for something that "sounds like" using keywords as kettle_drum mention (wordfile or straight from the database itself using full-text search functions) you might be able to find a solution.
Post Reply