Page 1 of 1

creating sentence patters using REGEX.

Posted: Mon Jun 04, 2007 8:15 pm
by icesha
hello i am new in using php and im studying it on my own. I just like to ask if how can i code sentence patterns using REGEX? i mean, creating patterns for english sentences if a user inputs a sentence or group of sentence and how can i check the grammar of that sentence. please help me. i need it very much. i just like to have some ideas or samples to guide me along so i can study it.

thank you very much for your kind understanding.
good day!

Posted: Mon Jun 04, 2007 8:23 pm
by Ambush Commander
I seriously doubt it. Regular expressions are usually used for simple-cases: something as complex as grammar will almost definitely have to be hand written.

Posted: Mon Jun 04, 2007 8:26 pm
by superdezign
Grammar? Whoa, that seems like a bit much, especially for being new to PHP.
Regex doesn't determine if words or verbs, nouns, or adjectives... It checks strings for pattern matches.

Of course, you may end up implementing regex into this solution of yours, but it'll take a lot of planning.
You'd, first have to have a database of every word in the dictionary. Then, each word would have to be define as a part of speech, and probably have conjugated verbs grouped together. Then, you'd have to define all the types of sentenced structures, putting into account adjectives with nouns and adverbs with verbs, and then requiring each sentence to have a noun subject and verb action.

Then there are special rules with commas, colons, and semi-colons you'd have to account for, as well as what type of word to consider words that are misspelled or just not in your database. Although, that's a bit much for starting.

And in the English language?? That sounds like quite a task.


You may be better off looking for a pre-made solution. I doubt they'll be free, though.

If you are going to go after this though, I'd love to see the progress of it.

Posted: Mon Jun 04, 2007 9:22 pm
by feyd
While it probably is possible, the pattern would be massive. It's probably better to just break it all down into tokens. Each token has a set of things that may follow it based on what's come before.

It's quite complicated either way, but done in code is far far less complicated than regex alone.