Page 2 of 2
Re: Implementing keyword comparison scheme (reverse search)
Posted: Sat Jan 10, 2009 2:01 pm
by jason.carter
Also to add to this.
Keyword search and indexing:
Excellent open source tool - Lucene Apache Solr.
Zend framework also now has a Lucene implementation in PHP
Re: Implementing keyword comparison scheme (reverse search)
Posted: Sun Jan 25, 2009 6:38 am
by josh
What you're going to want to do is identify commonly occuring "sub patterns" / features and perform the checking on those aggregrate features, only doing checks on the keyword level as a fallback
Re: Implementing keyword comparison scheme (reverse search)
Posted: Sun Jan 25, 2009 7:49 am
by Eran
I have no idea what you just said

But I already finished that project, successfully. My initial approach worked very well and I added some additional language specific filters which were easy to work with.
Re: Implementing keyword comparison scheme (reverse search)
Posted: Sun Jan 25, 2009 11:54 am
by josh
Cool, I meant if your word database was particularly large ( millions of records ) you would basically create another table that kept track of "features", then instead of checking each word individually ( which has a high asymptotic complexity ), you'd search for patterns of words, which would arguably be more contextually accurate, too.. But I'm all for keeping it simple and it sounds like this would have been overkill.
Re: Implementing keyword comparison scheme (reverse search)
Posted: Sun Jan 25, 2009 11:57 am
by Eran
ah I see what you meant now. At some point in the future we'll probably implement some sort of advanced word filter based on statistics and common combination, but for now it works well.
Re: Implementing keyword comparison scheme (reverse search)
Posted: Sun Jan 25, 2009 1:00 pm
by josh
Yep. then word stemming. Then comes first order logic and knowledge representation / reduction & induction, then there's combinatorial morphology, word & sentence heads. Noun phrases, verb phrases, and noun and verb bars, Then comes grammar tracing..
Eg "what did george put in the garage?" is "traced" syntactically by our brains to "[trace] george put what in the garage" ( see verbs require "arguments", and then humans come along and morph the rules to better suit our needs, which makes the algorithms much more complex ). If you're interested check out Noam Chomsky, the father of linguistics.