I would like opinions about building a server side spelling api to use within an application that I am trying to build. Sinfully aspell and pspell are not really within my requirements because I need it to work for all languages not just the ones supported by aspell. I don't need libraries with words since they might be built after the engine is working.
What I know:
- most spelling engines use double metaphone or similar algorithms to recognize the words that sound like other words that might be the replacement for the word in subject but sinfully the double metaphone, soundex, metaphone and so on and so forth are mostly for english words and words that come from spanish and imported into english.
- levenshtein would be best method to recognize words that look like other words or are misspelled by touching another letter on the keyboard or switching two letters (the classic 'teh' and 'alogrithm' are very well known). some of these words are found by metaphone or soundex but under certain circumstances in the european languages (for example romanian, my own language), some of the words that fall in this category are getting marked as misspelled yet no suggestions can be made by similarity with the actual word since they're not found on the metaphone checking of aspell
What I don't know:
- a way to calculate levenshtein distance between the subject word and all the database words... or filter this amount of words somehow but not by using the metaphone or soundex because while testing with these algorithms the results were not satisfying; and doing this fast enough to make this engine usable on a server environment for an api.
If you guys have any other idea or ... maybe we could brainstorm a bit on this subject in order to build this.
Thanks in advance for your help
spelling algorithms
Moderator: General Moderators
Re: spelling algorithms
I would look for some GNU stuff. Have a look through Debian package. There is quite a lot of open source code in this area.
Re: spelling algorithms
http://norvig.com/spell-correct.html
using the baysian method, it should give about on average at least 65% accuracy for 1st suggestion.
using the baysian method, it should give about on average at least 65% accuracy for 1st suggestion.
Re: spelling algorithms
All languages?valentin wrote:Sinfully aspell and pspell are not really within my requirements because I need it to work for all languages not just the ones supported by aspell.
och gråter så övergivet - Swedish
En ymmärrä - Finnish
Watashi wa tai-ryōri ga ii desu - Japanese (Or even 私はタイ料理がいいです。)
То запла́чет, как дитя́ - Russian
لا أتكلم العربية - Arabic
Good luck.
Re: spelling algorithms
Have you had a look at ICU? Some of that has been API-ed into PHP5-6. Some real complex stuff in there.
- kaisellgren
- DevNet Resident
- Posts: 1675
- Joined: Sat Jan 07, 2006 5:52 am
- Location: Lahti, Finland.
Re: spelling algorithms
Neither do I.onion2k wrote:En ymmärrä - Finnish
You will have tough time getting this job done... can I ask you why are you doing this?