For matching words is PHP or mySQL faster?

Ye' old general discussion board. Basically, for everything that isn't covered elsewhere. Come here to shoot the breeze, shoot your mouth off, or whatever suits your fancy.
This forum is not for asking programming related questions.

Moderator: General Moderators

Post Reply
enhancedrouting
Forum Newbie
Posts: 3
Joined: Tue Mar 01, 2011 12:44 pm

For matching words is PHP or mySQL faster?

Post by enhancedrouting »

Hi, I'm sure the answer to this is online somewhere, but I'm not sure what to search for. I apologize if this issue has already been addressed.

I have two arrays of words. I want to go through and match the words that occur in one array with the words that occur in the other array. Currently I am doing this with two foreach statements in php. Load time is around ten minutes for 10,000 entries.

These arrays start as tables in mySQL. Would it be faster to do the match/compare function in mySQL?
User avatar
s.dot
Tranquility In Moderation
Posts: 5001
Joined: Sun Feb 06, 2005 7:18 pm
Location: Indiana

Re: For matching words is PHP or mySQL faster?

Post by s.dot »

What is your table structure?
It could probably be done in SQL.

As for doing it in PHP, have a look at the array_diff() function: http://us3.php.net/array_diff
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: For matching words is PHP or mySQL faster?

Post by John Cartwright »

10 minutes for 10,000 iterations seems like way too long. You are probably doing something very inefficient.

Post your code.
enhancedrouting
Forum Newbie
Posts: 3
Joined: Tue Mar 01, 2011 12:44 pm

Re: For matching words is PHP or mySQL faster?

Post by enhancedrouting »

Here's the relavent section:

Code: Select all


foreach ($wordfrequencyarray as $topword => $frequency)
{
   $antimatch = 1;	
   foreach($ancwordfrequencyarray as $ancword => $ancfrequency)
   {
      if($topword === $ancword)
      {
         $antimatch = 0;
         $percent = ($frequency / $counter) * 100;
         $ancpercent = ($ancfrequency / $anccounter) * 100;
         $difference = $percent / $ancpercent;
         $composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
         $compcount++;
         break 1;
      }
   }

   if ($antimatch == 1)
   {
      $composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
      $compcount++;
   }	
}
I am doing a social research survey. I am matching words in the survey, with words in a piece of sample data. The above method works, but it takes a really long time.

The above code iterates over every word in my survey, and then compares that word to every word in the sample data. Then it outputs all matches to a multidimensional array with some other statistical information. It works, but takes a long time.

The tables I'm populating $ancwordfrequencyarray and $wordfrequencyarray from look like this:

ID => Word => Frequency
0 => The => 946
1 => and => 515
2 => is => 400
Last edited by John Cartwright on Tue Mar 01, 2011 3:42 pm, edited 2 times in total.
Reason: Fixed indentation so I could actual read it ;)
User avatar
John Cartwright
Site Admin
Posts: 11470
Joined: Tue Dec 23, 2003 2:10 am
Location: Toronto
Contact:

Re: For matching words is PHP or mySQL faster?

Post by John Cartwright »

EDIT | NVM! I misread your post after re-reading it. I will leave this up here as I think of another solution.

I can't really follow your logic, since I do not know what your variables mean or the array structure. Are you trying to do something like the following?

Code: Select all

$frequencyWords = array('the', 'brown', 'fox', 'jumped');

$content = 'The quick brown fox jumped over the fence. The quick green jcart jumped over the wall.';

$words = preg_split('~[^a-z]+~i', strtolower($content), -1, PREG_SPLIT_NO_EMPTY); 
$wordCount = array_intersect_key(array_count_values($words), array_flip($frequencyWords));
       
echo '<pre>'. print_r($wordCount, true) .'</pre>';
Which returns something like
Array
(
[the] => 4
[brown] => 1
[fox] => 1
[jumped] => 2
)
User avatar
Weirdan
Moderator
Posts: 5978
Joined: Mon Nov 03, 2003 6:13 pm
Location: Odessa, Ukraine

Re: For matching words is PHP or mySQL faster?

Post by Weirdan »

to start with, you don't need inner foreach in this case as you may directly address entries in the second array by the word:

Code: Select all


foreach ($wordfrequencyarray as $topword => $frequency)
{
   $antimatch = 1;      
   if(array_key_exists($topword, $ancwordfrequencyarray))
   {
      $ancfrequency = $ancwordfrequencyarray[$topword];
      $ancword = $topword;
         $antimatch = 0;
         $percent = ($frequency / $counter) * 100;
         $ancpercent = ($ancfrequency / $anccounter) * 100;
         $difference = $percent / $ancpercent;
         $composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
         $compcount++;
   }

   if ($antimatch == 1)
   {
      $composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
      $compcount++;
   }    
}
assuming $ancwordfrequencyarray is a big array (in order of tens of thousands) this change alone should speed up things several thousands times, which would likely be enough for your application.
User avatar
AbraCadaver
DevNet Master
Posts: 2572
Joined: Mon Feb 24, 2003 10:12 am
Location: The Republic of Texas
Contact:

Re: For matching words is PHP or mySQL faster?

Post by AbraCadaver »

Well crap Weirdan beat me too it, this is fairly similar, and yes it will save a lot of time. I have to take off, but I might come back to this one for some fun.

Code: Select all

foreach ($wordfrequencyarray as $topword => $frequency)
{
    $percent = ($frequency / $counter) * 100;

    if(isset($ancwordfrequencyarray[$topword]))
    {       
        $ancfrequency = $ancwordfrequencyarray[$topword];
        $ancpercent = ($ancfrequency / $anccounter) * 100;
        $difference = $percent / $ancpercent;
        $composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
    } else {
        $composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
    }
    $compcount++;
}
FYI... I don't see where $counter is defined and I don't see you using $compcount. And really no need for the $antimatch.
mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
enhancedrouting
Forum Newbie
Posts: 3
Joined: Tue Mar 01, 2011 12:44 pm

Re: For matching words is PHP or mySQL faster?

Post by enhancedrouting »

In the future, if I ask another question, I'll post the complete program, rather than the snippet I'm having trouble with. I apologize for the lack of clarity not doing so in the first place caused.

The solutions posted by Weirdan and AbraCadaver both worked. My program now executes in <30 seconds, and a fair amount of the remaining lag is a table sort script trying to processes a 1meg html output file. This is awesome. Thanks! :)
Post Reply