Page 1 of 1

Text/English parsing + cleaning up input

Posted: Fri Feb 17, 2006 7:53 pm
by stsr11
I'm looking for some advice on cleaning up (syntax/grammar) English text uploads to a site. Site users can paste or upload text as is. i.e. I want them to be able to submit anything and my code clean it up.

I want to go way beyond

I need to be able to turn sentences (both terminated by cr/lf and just fullstops) into paragraphs. I also want it to be able to deal with sentences wrapped using hyphens and join them into readable paragraphs.

There are also basic syntax errors which users submit without thinking. ie. No caps at start of sentence, spaces after commas, fullstops etc, extra spaces either site of hyphens, double punctuation, unacceptable chars etc., ie/i.e/i.e./e.g/eg. - all that lot.

I want to be able to correct all these in code.

Does anyone have any experience of this kind of thing or could someone point me in the direction of a PHP class library or code library to help?

Thanks

Seppo

Posted: Fri Feb 17, 2006 8:20 pm
by Christopher
You have listed a lot of thigs (with a few etc. thrown in there). Here is you list organized. All of these things can probably be done with str_replace() or preg_replace() without needing a parser.

- I need to be able to turn sentences (both terminated by cr/lf and just fullstops) into paragraphs.

- I also want it to be able to deal with sentences wrapped using hyphens and join them into readable paragraphs.

- No caps at start of sentence,

- spaces after commas,

- extra spaces either site of hyphens,

- double punctuation,

- unacceptable chars etc., ie/i.e/i.e./e.g/eg.

Posted: Mon Feb 20, 2006 8:52 am
by Maugrim_The_Reaper
nl2br() can help to format text typed into a textarea form. Correcting grammer is difficult. Especially in the new age of these leet people, the butchers of english grammer ;).