Detecting Plagiarism
Posted: Fri Jun 23, 2006 6:56 pm
I need a way to detect duplicate text posted to a database. It would need to check posted data against anything already posted to detect duplicates. (or partial duplication) I would also need a way to calculate a % of how similar they are.
I'm not sure how to approach this. I considered doing a word count (for each word) for each document, but I'm not sure that would work very well. Any ideas?
I'm not sure how to approach this. I considered doing a word count (for each word) for each document, but I'm not sure that would work very well. Any ideas?