Page 1 of 1

compare two articles for duplicate content

Posted: Wed Feb 15, 2012 12:00 am
by gprince66
Hi,

I want to compare two articles for duplicate content and highlight these words in yellow in the output of each article. The problem I am having is to compare only 3 or more word combos and ignore 1 or 2 word combos. I think I looked at every page on php.net and did every google search known to man. I am new to php and would greatly appreciate any guidance or help.

Thanks in advance,

Gary

Code: Select all

// identify 3+, 4+, 5+, etc. consecutive matching words between 2 strings
// ignore 1 & 2 consecutive matching words between 2 strings

// string#1 = the brown dog barked all day
// string#2 = the brown dog slept all day

// 'the brown dog' = 3 consecutive matching words between the 2 strings
// 'all day' = only 2 consecutive matching words between the 2 strings

// echo both strings with dulicate 3+, 4+, 5+, etc. words highlighted in yellow

// get user input
$str1 = $_POST['text1'];
$str2 = $_POST['text2'];

// explode strings into seperate words
$str1array = explode(" ", $str1);
$str2array = explode(" ", $str2);

// compare two arrays for duplicate 3+, 4+, 5+, etc. consecutive matching words
$dupwords = array_intersect($str1array, $str2array);

// echo both articles and their word count
echo '<br /><b>Article #1</b>&nbsp;-&nbsp;';
echo count_words($str1); 
echo '&nbsp;words<br />'; 
echo stripslashes($str1);
echo '<br /><br /><br /><b>Article #2</b>&nbsp;-&nbsp;';
echo count_words($str2); 
echo '&nbsp;words<br />';  
echo stripslashes($str2);
echo '<br /><br /><br /><br />';

Re: compare two articles for duplicate content

Posted: Wed Feb 15, 2012 3:32 am
by G l a z z
Hummmm i don't know what you want to achieve here but try this:

Code: Select all

// get user input
$str1 = 'the brown dog barked all day';
$str2 = 'the brown dog slept all day';

// explode strings into seperate words
$str1array = explode(" ", $str1);
$str2array = explode(" ", $str2);

$storage = array();

foreach($str1array as $item)
{
	if(isset($storage[$item])){
		$storage[$item] += 1;
	} else {
		$storage[$item] = 1;
	}
}

foreach($str2array as $item)
{
	if(isset($storage[$item])){
		$storage[$item] += 1;
	} else {
		$storage[$item] = 1;
	}
}

var_dump($storage);
It will return:

Code: Select all

array
  'the' => int 2
  'brown' => int 2
  'dog' => int 2
  'barked' => int 1
  'all' => int 2
  'day' => int 2
  'slept' => int 1

Re: compare two articles for duplicate content

Posted: Wed Feb 15, 2012 3:11 pm
by gprince66
Thanks Glazz,

I appreciate the help. Is there any way to adjust the code to only count 3 duplicate words or more in a row between the two strings?
In the above output it should only show array 'the brown dog' and ignore the rest as they are not 3 words in a row.