Text/English parsing + cleaning up input

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
stsr11
Forum Newbie
Posts: 17
Joined: Thu Jul 15, 2004 6:57 pm

Text/English parsing + cleaning up input

Post by stsr11 »

I'm looking for some advice on cleaning up (syntax/grammar) English text uploads to a site. Site users can paste or upload text as is. i.e. I want them to be able to submit anything and my code clean it up.

I want to go way beyond

I need to be able to turn sentences (both terminated by cr/lf and just fullstops) into paragraphs. I also want it to be able to deal with sentences wrapped using hyphens and join them into readable paragraphs.

There are also basic syntax errors which users submit without thinking. ie. No caps at start of sentence, spaces after commas, fullstops etc, extra spaces either site of hyphens, double punctuation, unacceptable chars etc., ie/i.e/i.e./e.g/eg. - all that lot.

I want to be able to correct all these in code.

Does anyone have any experience of this kind of thing or could someone point me in the direction of a PHP class library or code library to help?

Thanks

Seppo
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Post by Christopher »

You have listed a lot of thigs (with a few etc. thrown in there). Here is you list organized. All of these things can probably be done with str_replace() or preg_replace() without needing a parser.

- I need to be able to turn sentences (both terminated by cr/lf and just fullstops) into paragraphs.

- I also want it to be able to deal with sentences wrapped using hyphens and join them into readable paragraphs.

- No caps at start of sentence,

- spaces after commas,

- extra spaces either site of hyphens,

- double punctuation,

- unacceptable chars etc., ie/i.e/i.e./e.g/eg.
(#10850)
User avatar
Maugrim_The_Reaper
DevNet Master
Posts: 2704
Joined: Tue Nov 02, 2004 5:43 am
Location: Ireland

Post by Maugrim_The_Reaper »

nl2br() can help to format text typed into a textarea form. Correcting grammer is difficult. Especially in the new age of these leet people, the butchers of english grammer ;).
Post Reply