Page 1 of 1

For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 12:55 pm
by enhancedrouting
Hi, I'm sure the answer to this is online somewhere, but I'm not sure what to search for. I apologize if this issue has already been addressed.

I have two arrays of words. I want to go through and match the words that occur in one array with the words that occur in the other array. Currently I am doing this with two foreach statements in php. Load time is around ten minutes for 10,000 entries.

These arrays start as tables in mySQL. Would it be faster to do the match/compare function in mySQL?

Re: For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 1:06 pm
by s.dot
What is your table structure?
It could probably be done in SQL.

As for doing it in PHP, have a look at the array_diff() function: http://us3.php.net/array_diff

Re: For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 1:17 pm
by John Cartwright
10 minutes for 10,000 iterations seems like way too long. You are probably doing something very inefficient.

Post your code.

Re: For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 2:26 pm
by enhancedrouting
Here's the relavent section:

Code: Select all


foreach ($wordfrequencyarray as $topword => $frequency)
{
   $antimatch = 1;	
   foreach($ancwordfrequencyarray as $ancword => $ancfrequency)
   {
      if($topword === $ancword)
      {
         $antimatch = 0;
         $percent = ($frequency / $counter) * 100;
         $ancpercent = ($ancfrequency / $anccounter) * 100;
         $difference = $percent / $ancpercent;
         $composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
         $compcount++;
         break 1;
      }
   }

   if ($antimatch == 1)
   {
      $composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
      $compcount++;
   }	
}
I am doing a social research survey. I am matching words in the survey, with words in a piece of sample data. The above method works, but it takes a really long time.

The above code iterates over every word in my survey, and then compares that word to every word in the sample data. Then it outputs all matches to a multidimensional array with some other statistical information. It works, but takes a long time.

The tables I'm populating $ancwordfrequencyarray and $wordfrequencyarray from look like this:

ID => Word => Frequency
0 => The => 946
1 => and => 515
2 => is => 400

Re: For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 3:57 pm
by John Cartwright
EDIT | NVM! I misread your post after re-reading it. I will leave this up here as I think of another solution.

I can't really follow your logic, since I do not know what your variables mean or the array structure. Are you trying to do something like the following?

Code: Select all

$frequencyWords = array('the', 'brown', 'fox', 'jumped');

$content = 'The quick brown fox jumped over the fence. The quick green jcart jumped over the wall.';

$words = preg_split('~[^a-z]+~i', strtolower($content), -1, PREG_SPLIT_NO_EMPTY); 
$wordCount = array_intersect_key(array_count_values($words), array_flip($frequencyWords));
       
echo '<pre>'. print_r($wordCount, true) .'</pre>';
Which returns something like
Array
(
[the] => 4
[brown] => 1
[fox] => 1
[jumped] => 2
)

Re: For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 4:15 pm
by Weirdan
to start with, you don't need inner foreach in this case as you may directly address entries in the second array by the word:

Code: Select all


foreach ($wordfrequencyarray as $topword => $frequency)
{
   $antimatch = 1;      
   if(array_key_exists($topword, $ancwordfrequencyarray))
   {
      $ancfrequency = $ancwordfrequencyarray[$topword];
      $ancword = $topword;
         $antimatch = 0;
         $percent = ($frequency / $counter) * 100;
         $ancpercent = ($ancfrequency / $anccounter) * 100;
         $difference = $percent / $ancpercent;
         $composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
         $compcount++;
   }

   if ($antimatch == 1)
   {
      $composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
      $compcount++;
   }    
}
assuming $ancwordfrequencyarray is a big array (in order of tens of thousands) this change alone should speed up things several thousands times, which would likely be enough for your application.

Re: For matching words is PHP or mySQL faster?

Posted: Tue Mar 01, 2011 4:56 pm
by AbraCadaver
Well crap Weirdan beat me too it, this is fairly similar, and yes it will save a lot of time. I have to take off, but I might come back to this one for some fun.

Code: Select all

foreach ($wordfrequencyarray as $topword => $frequency)
{
    $percent = ($frequency / $counter) * 100;

    if(isset($ancwordfrequencyarray[$topword]))
    {       
        $ancfrequency = $ancwordfrequencyarray[$topword];
        $ancpercent = ($ancfrequency / $anccounter) * 100;
        $difference = $percent / $ancpercent;
        $composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
    } else {
        $composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
    }
    $compcount++;
}
FYI... I don't see where $counter is defined and I don't see you using $compcount. And really no need for the $antimatch.

Re: For matching words is PHP or mySQL faster?

Posted: Wed Mar 02, 2011 9:59 am
by enhancedrouting
In the future, if I ask another question, I'll post the complete program, rather than the snippet I'm having trouble with. I apologize for the lack of clarity not doing so in the first place caused.

The solutions posted by Weirdan and AbraCadaver both worked. My program now executes in <30 seconds, and a fair amount of the remaining lag is a table sort script trying to processes a 1meg html output file. This is awesome. Thanks! :)