Page 1 of 1
Matching Content
Posted: Mon Dec 04, 2006 1:59 pm
by GeXus
Say you wanted to take two pages, and determine the percent in which the content of the pages match.. such as x% of page A matches page B.
Does anyone have any incite as to the best way for doing this?
Posted: Mon Dec 04, 2006 2:20 pm
by RobertGonzalez
How would even begin to consider the logic behind that? How do you determine percent similarity?
Posted: Mon Dec 04, 2006 2:27 pm
by GeXus
I dont know.. I would imagine you would have to have maybe a set character length.. so lets say 100.
1. You would count all of the characters from each source
2. Determine if any characters (grouped in order of 100) match from source to source.
3. Get the # that match, and determine the percent...
Posted: Mon Dec 04, 2006 2:28 pm
by RobertGonzalez
Have fun with that one man...
