Writing something to parse natural language requests.

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Writing something to parse natural language requests.

Post by onion2k »

I'm writing a little script that returns data based on a specific request format.. at the moment it receives a string of...

Code: Select all

<username> I like <thing>
... and it returns a fact about <thing>.

I'd rather like to make it a bit 'looser'. For example, I want to accept strings like "<username> I love <thing>", "<username> I really like <thing>", "<username> I like <thing> loads!". I can't really do a strpos() for all the "<thing>" options because there are currently over 110,000 of them. I suppose I could sit here and think up every possible format I can think of and then do a regexp for each of them, but that's pretty nasty too.

Has anyone here written some sort of a parser for this sort of thing? How does one go about it?

EDIT: <thing> can be several words long. I think that makes a difference.
alex.barylski
DevNet Evangelist
Posts: 6267
Joined: Tue Dec 21, 2004 5:00 pm
Location: Winnipeg

Re: Writing something to parse natural language requests.

Post by alex.barylski »

All I can really tel you is this is extremely complex stuff. :D

I assume you have Googled: http://www.google.ca/search?hl=en&rlz=1 ... sing&meta=

Natural Language Processing are the keywords of interest here.

It's not easy and not something I know of existing libraries that can help with, unlike parsing programming languages like PHP, I think natural languages have a much more flexible grammar, virtually limitless, which makes processing them very difficult.

English, unlike PHP, evolved over time, whereas, PHP's grammar was planned or already understood from the get go.

I suppose one could compile a database of all the caveats, etc of English, but I dought that is realistic, so I believe the field of NLP is basically taking educated/calculated guesses, otherwise if this field was mastered Google would be answering your questions perfectly.

My suggestion, would be to consider using something less trivial, like maybe soundex or download an English thesurasus to perform lookups on similar words. You can usually exlcude any words less than 3 characters.

Split the sentance up into words, drop those less than 3 characters, iterate array and soundex or compare to some dictionary source until you find something interesting.

Anything much beyond this is very theoretical and long winded, not to mention difficult to comprehend.
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: Writing something to parse natural language requests.

Post by Weirdan »

Does that mean you'd like to parse even such requests as ' "I like it" said <username>, speaking about the <thing>'? Are those <things> nouns? Do you know all the verbs you'd like to detect?
User avatar
omniuni
Forum Regular
Posts: 738
Joined: Tue Jul 15, 2008 10:50 pm
Location: Carolina, USA

Re: Writing something to parse natural language requests.

Post by omniuni »

Hm. Brainstorming...

Given a list of possible <thing>s, and <negative>s, I'd start by creating a set of functions to test for language patterns. For example, if I had an array of $likePhrases that included options such as "*i*!<negative>*like*<thing>*", "*<thing>*is*i*!<negative>like*", etc. I should be able to recognize the sentence structure as whether a person like something, and return a fact, or if it has the negative, I could say "I'm sorry you don't like <thing>." As an interesting side note, where it becomes difficult is with things like "I don't hate" which would, in this syntax, be represented as "*i*<negative>*<negative>*<thing>*". Also, I'd have to check first if I even want to parse it against the $likePhrases filter! Ok, so it gets difficult anyway. Good Luck!!!
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Writing something to parse natural language requests.

Post by Christopher »

Better to Google this:

http://www.google.ca/search?hl=en&rlz=1 ... arch&meta=

You really don't want to do this in PHP. I am sure you can find some software that you give a string and it will return a bunch of useful data that you can use to do something interesting. I notice that OpenNLP has a toolkit and there are others.
(#10850)
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Writing something to parse natural language requests.

Post by onion2k »

I tested a way of testing all the <thing>s in the end. It's faster than I thought it'd be... only takes 0.25s. I think I can get that down a lot. I can do:

[sql]SELECT *FROM `tm_things`WHERE 'Bob, I really like Dell computers. They are so dreamy!' LIKE CONCAT( '% ', `title` , '%' )AND LENGTH( `title` ) > 4[/sql]

That returns "Dell computers". Unfortunately it also returns "computer" and "dream" though.
Post Reply