Page 1 of 1

finding duplicates in an array

Posted: Wed Mar 02, 2005 11:06 pm
by bluesman333
i have a list of about 11000 companies and the state they belong to in a txt file. there are about 300 duplicate company names. i want to find the duplicates names and print them out. (regardless of what state shows next to the name).

the important thing is i need to know which ones have duplicates.

here is what i have now. this is going to run forever. there has got to be a better way.


Code: Select all

<?php
$file = file("companies.txt");
$num = count($file);
$i = 0;
print "<b>$num</b><br>";

	while($i < $num) &#123;
		$line = chop($file&#1111;$i]);
		$line = split("\t", $line);
		$companies&#1111;] = $line&#1111;0];
		$states&#1111;] = $line&#1111;1];
		$i++;
	&#125;

$new_array = $companies;

$num = count($companies);
$count = 0;
$i = 0;
$x = 0;
	while($i < $num) &#123;
		while($x < count($companies)) &#123;
			if ($new_array&#1111;$i] == $companies&#1111;$x]) &#123;
				$count++;
				if($count >= 2) &#123;
					print "$new_array&#1111;$i]	$states&#1111;$i]<br>";
			&#125;
		$x++;
		&#125;
	$x = 0;
	$i++;	
	&#125;
 
?>

Posted: Wed Mar 02, 2005 11:10 pm
by smpdawg
Use array_unique to create a new, clean (no duplicates) array. Then pass those two arrays through array_diff. You should end up with the duplicate entries.

http://us3.php.net/array

you mean this...doesn't seem to work

Posted: Wed Mar 02, 2005 11:28 pm
by bluesman333

Code: Select all

<?php

$companies = array('microsoft', 'google', 'msn', 'google', 'mlb', 'nba', 'mlb');
	foreach ($companies as $val) &#123; 
	print "$val<br>";
	&#125;

	print "<br>";
	print "<br>";

$unique_array = array_unique($companies);
	foreach ($unique_array as $val) &#123; 
	print "$val<br>";
	&#125;

	print "<br>";
	print "<br>";

$difference = array_diff($companies, $unique_array);
	foreach($difference as $val) &#123;
	print "$val<br>";
	&#125;
?>

Posted: Wed Mar 02, 2005 11:45 pm
by smpdawg
My brain wasn't working when I said that. 20 hours of programming had me say something that didn't make sense. Let me think about this one...

Posted: Thu Mar 03, 2005 12:19 am
by smpdawg
Try this example. I basically walk the unique array and purge the matching entry in the array that has duplicates. The arrays are passed by doing a copy so the originals are not destroyed by the operation.
<?php

$companies
= array('microsoft', 'google', 'msn', 'google', 'mlb', 'nba', 'mlb');
   foreach (
$companies as $val) {
   print
"$val<br>";
   }

   print
"<br>";
   print
"<br>";

$unique_array = array_unique($companies);
   foreach (
$unique_array as $val) {
   print
"$val<br>";
   }

   print
"<br>";
   print
"<br>";

   
$difference = diff($companies, $unique_array);
   foreach(
$difference as $val) {
   print
"$val<br>";
   }

   
// Array 1 must be the array with the duplicates.
function diff($array1, $array2) {
   
reset($array2);
   
$elem2 = current($array2);
   while (
$elem2 !== false) {
     
$match = array_search($elem2, $array1);
     if (
$match !== false) {
       unset(
$array1[$match]);
     }       
     
$elem2 = next($array2);
   }   
   return
$array1;
}
   
?>