Text/English parsing + cleaning up input
Posted: Fri Feb 17, 2006 7:53 pm
I'm looking for some advice on cleaning up (syntax/grammar) English text uploads to a site. Site users can paste or upload text as is. i.e. I want them to be able to submit anything and my code clean it up.
I want to go way beyond
I need to be able to turn sentences (both terminated by cr/lf and just fullstops) into paragraphs. I also want it to be able to deal with sentences wrapped using hyphens and join them into readable paragraphs.
There are also basic syntax errors which users submit without thinking. ie. No caps at start of sentence, spaces after commas, fullstops etc, extra spaces either site of hyphens, double punctuation, unacceptable chars etc., ie/i.e/i.e./e.g/eg. - all that lot.
I want to be able to correct all these in code.
Does anyone have any experience of this kind of thing or could someone point me in the direction of a PHP class library or code library to help?
Thanks
Seppo
I want to go way beyond
I need to be able to turn sentences (both terminated by cr/lf and just fullstops) into paragraphs. I also want it to be able to deal with sentences wrapped using hyphens and join them into readable paragraphs.
There are also basic syntax errors which users submit without thinking. ie. No caps at start of sentence, spaces after commas, fullstops etc, extra spaces either site of hyphens, double punctuation, unacceptable chars etc., ie/i.e/i.e./e.g/eg. - all that lot.
I want to be able to correct all these in code.
Does anyone have any experience of this kind of thing or could someone point me in the direction of a PHP class library or code library to help?
Thanks
Seppo