Page 1 of 1
Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 7:55 am
by alex.barylski
I have a database of keywords (about 9K) many of which are semi-redundant, such as:
Code: Select all
Consulting
Consultants
Consultation
Ideally I want to crunch these down to 'consult' the root word?
Does anyone know of an algorithm (preferably implemented in PHP) which would allow me to convert keywords into root words, so if someone enters 'computer consulting' it will match against 'computers', 'computing', 'consultation', 'consultants' and so forth???
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 8:01 am
by arjan.top
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 8:13 am
by alex.barylski
I just found that wiki article...not exactly what I was hoping for
http://en.wikipedia.org/wiki/Stemming
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 8:18 am
by Eran
You can use functions like similar_text(), levenshtein() and soundex() to produce the results you want. Have a look at this article on fuzzy search in PHP:
http://porteightyeight.com/2008/03/07/f ... hp-part-1/
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 9:13 am
by alex.barylski
pytrin: It turns out what I needed was indeed a stemming function. I'm not doing any of the search in PHP (all done in MySQL via one wicked query a co-worker implemented). I needed a way to find the stem of a word which it does now and works great.
Cheers,
Alex
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 10:11 am
by Eran
what way did you use to find the stem in the end?
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 10:29 am
by alex.barylski
An implementation I found on the web...looking for link but of course cannot find it...if I remember correctly it was an article in the wiki entry for stemming.
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 10:46 am
by arjan.top
second link in my post, implemented for all the major programming languages, some implemented by Martin Porter himself
Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 1:12 pm
by alex.barylski
That was it, yes. Porter Stemming class

Re: Singularize/depluralize/inflection
Posted: Fri Jul 17, 2009 2:36 pm
by Eran
thanks guys, looks interesting
Re: Singularize/depluralize/inflection
Posted: Sat Jul 18, 2009 11:14 pm
by Benjamin

Moved to PHP - Theory and Design