Page 1 of 1

spelling algorithms

Posted: Fri May 08, 2009 4:50 am
by valentin
I would like opinions about building a server side spelling api to use within an application that I am trying to build. Sinfully aspell and pspell are not really within my requirements because I need it to work for all languages not just the ones supported by aspell. I don't need libraries with words since they might be built after the engine is working.
What I know:
- most spelling engines use double metaphone or similar algorithms to recognize the words that sound like other words that might be the replacement for the word in subject but sinfully the double metaphone, soundex, metaphone and so on and so forth are mostly for english words and words that come from spanish and imported into english.
- levenshtein would be best method to recognize words that look like other words or are misspelled by touching another letter on the keyboard or switching two letters (the classic 'teh' and 'alogrithm' are very well known). some of these words are found by metaphone or soundex but under certain circumstances in the european languages (for example romanian, my own language), some of the words that fall in this category are getting marked as misspelled yet no suggestions can be made by similarity with the actual word since they're not found on the metaphone checking of aspell

What I don't know:
- a way to calculate levenshtein distance between the subject word and all the database words... or filter this amount of words somehow but not by using the metaphone or soundex because while testing with these algorithms the results were not satisfying; and doing this fast enough to make this engine usable on a server environment for an api.

If you guys have any other idea or ... maybe we could brainstorm a bit on this subject in order to build this.

Thanks in advance for your help

Re: spelling algorithms

Posted: Sat May 09, 2009 6:33 pm
by david64
I would look for some GNU stuff. Have a look through Debian package. There is quite a lot of open source code in this area.

Re: spelling algorithms

Posted: Mon May 11, 2009 8:56 am
by wei
http://norvig.com/spell-correct.html

using the baysian method, it should give about on average at least 65% accuracy for 1st suggestion.

Re: spelling algorithms

Posted: Mon May 11, 2009 10:25 am
by onion2k
valentin wrote:Sinfully aspell and pspell are not really within my requirements because I need it to work for all languages not just the ones supported by aspell.
All languages?

och gråter så övergivet - Swedish
En ymmärrä - Finnish
Watashi wa tai-ryōri ga ii desu - Japanese (Or even 私はタイ料理がいいです。)
То запла́чет, как дитя́ - Russian
لا أتكلم العربية - Arabic

Good luck. :)

Re: spelling algorithms

Posted: Mon May 11, 2009 5:45 pm
by david64
Have you had a look at ICU? Some of that has been API-ed into PHP5-6. Some real complex stuff in there.

Re: spelling algorithms

Posted: Tue May 12, 2009 7:41 am
by kaisellgren
onion2k wrote:En ymmärrä - Finnish
Neither do I.

You will have tough time getting this job done... can I ask you why are you doing this?