Page 1 of 1

Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 7:55 am
by alex.barylski
I have a database of keywords (about 9K) many of which are semi-redundant, such as:

Code: Select all

 
Consulting
Consultants
Consultation
 
Ideally I want to crunch these down to 'consult' the root word?

Does anyone know of an algorithm (preferably implemented in PHP) which would allow me to convert keywords into root words, so if someone enters 'computer consulting' it will match against 'computers', 'computing', 'consultation', 'consultants' and so forth???

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 8:01 am
by arjan.top

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 8:13 am
by alex.barylski
I just found that wiki article...not exactly what I was hoping for :|

http://en.wikipedia.org/wiki/Stemming

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 8:18 am
by Eran
You can use functions like similar_text(), levenshtein() and soundex() to produce the results you want. Have a look at this article on fuzzy search in PHP:
http://porteightyeight.com/2008/03/07/f ... hp-part-1/

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 9:13 am
by alex.barylski
pytrin: It turns out what I needed was indeed a stemming function. I'm not doing any of the search in PHP (all done in MySQL via one wicked query a co-worker implemented). I needed a way to find the stem of a word which it does now and works great. :)

Cheers,
Alex

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 10:11 am
by Eran
what way did you use to find the stem in the end?

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 10:29 am
by alex.barylski
An implementation I found on the web...looking for link but of course cannot find it...if I remember correctly it was an article in the wiki entry for stemming.

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 10:46 am
by arjan.top
second link in my post, implemented for all the major programming languages, some implemented by Martin Porter himself

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 1:12 pm
by alex.barylski
That was it, yes. Porter Stemming class :)

Re: Singularize/depluralize/inflection

Posted: Fri Jul 17, 2009 2:36 pm
by Eran
thanks guys, looks interesting

Re: Singularize/depluralize/inflection

Posted: Sat Jul 18, 2009 11:14 pm
by Benjamin
:arrow: Moved to PHP - Theory and Design