Page 1 of 1

Fuzzy Name Search - Could this be more efficient?

Posted: Wed Jan 13, 2010 7:19 pm
by Parantaja
pickle | Please use [ code=php ], [ code=text ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.


Let's say I have a site that houses statistics for pro athletes and I want users to be able to search for an athlete by name. I have created some code that allows for a little margin of error in spelling and it does work. Bu, I'm new to PHP (and programming in general).

Is this code fairly efficient? Or is there a lot of room for improvement?

Code: Select all

 
<?php
mysql_connect("localhost", "--REDACTED--", "--REDACTED--") or die(mysql_error());
mysql_select_db("--REDACTED--") or die(mysql_error());
mysql_query("SET NAMES 'utf8'");
 
$AllMatches = array();
$OtherMatches = array();
$ApproximateMatchDivTop=140;
$ApproximateMatchDivHeight=0;
$OtherMatchDivTop=140;
$OtherMatchDivHeight=0;
$PerfectMatchDivHeight=0;
 
include("z_menu.php");
?>
 
<div style="position:absolute; width:640px; height:30px; top:120px; left:320px">
<form method="post" action="<?php echo $_SERVER["PHP_SELF"]; ?>">
<input name="Mod" type="text" size="40" />
<input name="submit" type="submit" value="submit" />
</form>
</div>
 
 
<?php
if (isset($_POST[submit])) //*** -IF FORM SUBMITTED- ***//
{
$UserIn=mysql_real_escape_string(strip_tags($_POST["Mod"]));
 
 
//***************************  DETERMINE MATCHES  ***************************//
$ArrCount=0;
$result = mysql_query("SELECT DISTINCT `names`.`name`,`names`.`index`,`stats`.`sample`,`stats`.`source` FROM `names`,`stats` WHERE `sample`<>'mybanners/unavailable.jpg' && `stats`.`index`=`names`.`index`");
$modcount = mysql_num_rows($result);
for($count=0; $count<$modcount; $count++)
{ 
  $row = mysql_fetch_array ($result);
  similar_text(strtolower($UserIn), strtolower($row[0]), $percent);
  if ($percent>85)
  {
    if ($ArrCount==0)
    {
      $AllMatches[$ArrCount][0]=round($percent,2);
      $AllMatches[$ArrCount][1]=$row[0];
      $AllMatches[$ArrCount][2]=(int)$row[1];
      $AllMatches[$ArrCount][3]=$row[2];
      $AllMatches[$ArrCount][4]=$row[3];
      $test="unique";
      $ArrCount++;
    }
    if ($ArrCount>0)
    {
      for($subcount=0; $subcount<(count($AllMatches)); $subcount++)
      {
        if (in_array((int)$row[1],$AllMatches[$subcount]))
        {  
          $test="duplicate";
        }
      }
      if ($test=="unique")
      {
        $AllMatches[$ArrCount][0]=round($percent,2);
        $AllMatches[$ArrCount][1]=$row[0];
        $AllMatches[$ArrCount][2]=(int)$row[1];
        $AllMatches[$ArrCount][3]=$row[2];
        $AllMatches[$ArrCount][4]=$row[3];
        $ArrCount++;
      }
      else
      {
        $test="unique";
      }
    }
  }   
}
 
//***************************  SORT MATCHES  ***************************//
if (count($AllMatches)>0)
{
  $count=0;
  $PerfectMatchCount=0;
  foreach ($AllMatches as $key => $row)
  {
    $mod[$key]  = $row[1];
    $match[$key] = $row[0];
    if ($AllMatches[$count][0]==100)
    {
      $PerfectMatchCount++;
    }
    $count++;
  }
  array_multisort($match, SORT_DESC, $mod, SORT_ASC, $AllMatches);
  $ApproximateMatchCount=count($AllMatches)-$PerfectMatchCount;
}
 
//***************************  OTHER POSSIBLE MATCHES  ***************************//
$result=mysql_query("SELECT DISTINCT `names`.`name`,`names`.`index`,`stats`.`sample`,`stats`.`source` FROM `names`,`stats` WHERE `names`.`name` LIKE '".$UserIn." %' && `sample`<>'mybanners/unavailable.jpg' && `stats`.`index`=`names`.`index` ORDER BY `name` ASC ");
$OtherMatchCount=0;
while ($row=mysql_fetch_array($result))
{
  $test="unique";
  for($count=0; $count<(count($AllMatches)); $count++)
  {
    if (in_array((int)$row[1],$AllMatches[$count]))
    {  
      $test="duplicate";
    }
  }
  for($count=0; $count<(count($OtherMatches)); $count++)
  {
    if (in_array((int)$row[1],$OtherMatches[$count]))
    {  
      $test="duplicate";
    }
  }
  if ($test=="unique")
  {
    $OtherMatches[$OtherMatchCount][0]=$row[0];
    $OtherMatches[$OtherMatchCount][1]=(int)$row[1];
    $OtherMatches[$OtherMatchCount][2]=$row[2];
    $OtherMatches[$OtherMatchCount][3]=$row[3];
    $OtherMatchCount++;
  }
  else
  {
    $test="unique";
  }
}
 
//***************************  OUTPUT EXACT MATCHES  ***************************//
if ($PerfectMatchCount!=0)
{
  $PerfectMatchDivHeight=0;
  if ($PerfectMatchCount%6!=0)
  {
    $PerfectMatchDivHeight=285;
  }
  $PerfectMatchDivHeight=($PerfectMatchDivHeight+(floor($PerfectMatchCount/6))*285)+50;
  echo "<div style=\"position:absolute; width:1020px; height:".$PerfectMatchDivHeight."px; top:140px; left:118px\">";
  echo "<h3>Perfect Matches</h3>";
  $top=50;
  $left=0;
  for ($count=0; $count<$PerfectMatchCount; $count++)
  {
    echo "<div style=\"width:170px; height:275px; position:absolute; left:".$left."px; top:".$top."px; overflow:hidden;\">";
    echo $AllMatches[$count][1]."<br />".$AllMatches[$count][4];
   echo "<a href=\"http://www.site.com/z_details.php?name=".$AllMatches[$count][2]."\"><img src=\"z_thumbs/tn_".$AllMatches[$count][3]."\" width=\"100%;\" alt=\"".$AllMatches[$count][0]."\"/></a>\n";
    echo "</div>";
    $left=$left+170;
    if ((($count+1)%6==0)&&($count!=0))
    {
      $top=$top+285;
      $left=0;
    }
  }
  echo "</div>";
}
 
//***************************  OUTPUT APPROXIMATE MATCHES  ***************************//
if ($ApproximateMatchCount!=0)
{
  if ($ApproximateMatchCount%6!=0)
  {
    $ApproximateMatchDivHeight=285;
  }
  $ApproximateMatchDivHeight=($ApproximateMatchDivHeight+(floor($ApproximateMatchCount/6))*285)+50;
  if ($PerfectMatchCount!=0)
  {
    $ApproximateMatchDivTop=$PerfectMatchDivHeight+150;
  }
  echo "<div style=\"position:absolute; width:1020px; height:".$ApproximateMatchDivHeight."px; top:".$ApproximateMatchDivTop."px; left:118px;\">";
  echo "<h3>Approximate Matches</h3>";
  $left=0;
  $top=50;
  for ($count=0; $count<$ApproximateMatchCount; $count++)
  {
    echo "<div style=\"width:170px; height:275px; position:absolute; left:".$left."px; top:".$top."px; overflow:hidden;\">";
    echo $AllMatches[$count+$PerfectMatchCount][1]."<br />".$AllMatches[$count+$PerfectMatchCount][4];
    echo "<a href=\"http://www.site.com/z_details.php?name=".$AllMatches[$count+$PerfectMatchCount][2]."\"><img src=\"z_thumbs/tn_".$AllMatches[$count+$PerfectMatchCount][3]."\" width=\"100%;\" alt=\"".$AllMatches[$count+$PerfectMatchCount][0]."\"/></a>\n";
    echo "</div>";
    $left=$left+170;
    if (($count+1)%6==0)
    {
      $top=$top+285;
      $left=0;
    }
  }
  echo "</div>";
}
 
//***************************  OUTPUT 'STARTS WITH' MATCHES  ***************************//
if ($OtherMatchCount!=0)
{
  if ($OtherMatchCount%6!=0)
  {
    $OtherMatchDivHeight=285;
  }
  $OtherMatchDivHeight=($OtherMatchDivHeight+(floor($OtherMatchCount/6))*285)+50;
  if (($ApproximateMatchCount!=0) || ($PerfectMatchCount!=0))
  {
    $OtherMatchDivTop=$PerfectMatchDivHeight+190+$ApproximateMatchDivHeight;
  }
  echo "<div style=\"position:absolute; width:1020px; height:".$OtherMatchDivHeight."px; top:".$OtherMatchDivTop."px; left:118px;\">";
  echo "<h3>'Begins With' Matches</h3>";
  $left=0;
  $top=50;
  for ($count=0; $count<count($OtherMatches); $count++)
  {
    echo "<div style=\"width:170px; height:275px; position:absolute; left:".$left."px; top:".$top."px; overflow:hidden;\">";
    echo $OtherMatches[$count][0]."<br />".$OtherMatches[$count][3];
    echo "<a href=\"http://www.site.com/z_details.php?name=".$OtherMatches[$count][1]."\"><img src=\"z_thumbs/tn_".$OtherMatches[$count][2]."\" width=\"100%;\" alt=\"".$OtherMatches[0]."\"/></a>\n";
    echo "</div>";
    $left=$left+170;
    if (($count+1)%6==0)
    {
      $top=$top+285;
      $left=0;
    }
  }
  echo "</div>";
}
 
 
mysql_close;
} //*** END -IF FORM SUBMITTED- ***//
 
?>

pickle | Please use [ code=php ], [ code=text ], etc tags where appropriate when posting code. Your post has been edited to reflect how we'd like it posted. Please read: :arrow: Posting Code in the Forums to learn how to do it too.

Re: Fuzzy Name Search - Could this be more efficient?

Posted: Thu Jan 14, 2010 1:41 pm
by pickle
I think there's potential to do some approximate matching right in the query itself. You may be able to use a FULL TEXT index to find approximate matches or possibly use SOUNDEX

Re: Fuzzy Name Search - Could this be more efficient?

Posted: Mon Jan 18, 2010 10:36 am
by Parantaja
Apologies for not using the proper tags to post my code. I'll try not to let it happen again.

Thanks for the advice regarding soundex. I hadn't thought of that. If anyone else has any suggestions, I'd love to hear them as well.

Re: Fuzzy Name Search - Could this be more efficient?

Posted: Mon Jan 18, 2010 5:37 pm
by VladSun
Maybe Levenshtein or Damerau-Levenshtein distance...