math??? statistic!!!
Posted: Mon Mar 01, 2004 9:17 am
Hi all,
first of all, i'm not a math and don't know mutch about statstics, so my problem is about this:
I've to test to strings with names ($name1(name in DB), $name2(True name));
so my code is ...
//------------------------------------------------------
// this is the part i really need the help
// how can i calculate the probability of the names are equal???
// is this case i'm finding the percent of correct words. but this is not
// the probability of the same person. for example i'm trying to find
// "John Jonathan Montbaten Edwards" (4 words - the true name),
// in my DB i've a "John Edwards" and a "Jonathan Collins",
// my perc have a 100% in the first and a 50% in the second...
// however if in my DB i've a record only with one name, "Edwards"
// this i'll give me the same 100%
//
// it's obvious that's only math... but
thanks for your kind attention
Pedro
first of all, i'm not a math and don't know mutch about statstics, so my problem is about this:
I've to test to strings with names ($name1(name in DB), $name2(True name));
so my code is ...
Code: Select all
<?
connect bla bla to mysql ... // don't have a problem
connect bla bla to oracle ... // don't have a problem (9i)
(...)
function Compara($str1,$str2)
{
if (str1== $str2)
{
return 1;
}
else
{
return 0;
}
}
while (ora_fetch_into($cursor, $results, ORA_FETCHINTO_ASSOC))
{
$nome1 = $results['NOME_G'];
//********************************
$array_1 = explode(" ",$nome1);
$array_2 = explode(" ",$nome2);
$num1= count($array_1)-1;
$num2= count($array_2)-1;
$resultado = "";
$palavramenores = 0;
for ($i = 0; $i <= $num1; $i++)
{
for ($f = 0 ; $f <= $num2 ; $f++)
{
if (strlen($array_1[$i]) >= 3) // if the word is less than 3 ignore
{
$xpto = Compara(($array_1[$i]),($array_2[$f]));
$resultado += $xpto;
}
else
{
$palavramenores += 1;
break;
}
}
}// this is the part i really need the help
// how can i calculate the probability of the names are equal???
// is this case i'm finding the percent of correct words. but this is not
// the probability of the same person. for example i'm trying to find
// "John Jonathan Montbaten Edwards" (4 words - the true name),
// in my DB i've a "John Edwards" and a "Jonathan Collins",
// my perc have a 100% in the first and a 50% in the second...
// however if in my DB i've a record only with one name, "Edwards"
// this i'll give me the same 100%
//
// it's obvious that's only math... but
Code: Select all
$perc = (($resultado*100)/(($num1+1)-$palavramenores));
if ($num2 > $num1){
$perc = $perc-($num2-$num1)*2;
}
$xpto_1= round($perc,2);
//--------------------------------------------------------
if ($perc > 39) // if my perc is lower than 40 exclude else insert into DB
{
$q_insert = "INSERT INTO `test_names` (`num_seq`, `name_bd`, `name_tested`, `perc`) VALUES (NULL, '".$name1."', '".$name2."','".$xpto_1."')";
$result = mysql_query($q_insert);
}
}
echo "Finish...";
?>thanks for your kind attention
Pedro