Page 1 of 1

Detecting Plagiarism

Posted: Fri Jun 23, 2006 6:56 pm
by Benjamin
I need a way to detect duplicate text posted to a database. It would need to check posted data against anything already posted to detect duplicates. (or partial duplication) I would also need a way to calculate a % of how similar they are.

I'm not sure how to approach this. I considered doing a word count (for each word) for each document, but I'm not sure that would work very well. Any ideas?

Posted: Fri Jun 23, 2006 6:58 pm
by feyd

Posted: Fri Jun 23, 2006 7:10 pm
by Benjamin
Hmm, looks like the function you mentioned has a 255 character limit, you put me on the right track though. Maybe similar_text() will work.