For matching words is PHP or mySQL faster?
Moderator: General Moderators
-
enhancedrouting
- Forum Newbie
- Posts: 3
- Joined: Tue Mar 01, 2011 12:44 pm
For matching words is PHP or mySQL faster?
Hi, I'm sure the answer to this is online somewhere, but I'm not sure what to search for. I apologize if this issue has already been addressed.
I have two arrays of words. I want to go through and match the words that occur in one array with the words that occur in the other array. Currently I am doing this with two foreach statements in php. Load time is around ten minutes for 10,000 entries.
These arrays start as tables in mySQL. Would it be faster to do the match/compare function in mySQL?
I have two arrays of words. I want to go through and match the words that occur in one array with the words that occur in the other array. Currently I am doing this with two foreach statements in php. Load time is around ten minutes for 10,000 entries.
These arrays start as tables in mySQL. Would it be faster to do the match/compare function in mySQL?
Re: For matching words is PHP or mySQL faster?
What is your table structure?
It could probably be done in SQL.
As for doing it in PHP, have a look at the array_diff() function: http://us3.php.net/array_diff
It could probably be done in SQL.
As for doing it in PHP, have a look at the array_diff() function: http://us3.php.net/array_diff
Set Search Time - A google chrome extension. When you search only results from the past year (or set time period) are displayed. Helps tremendously when using new technologies to avoid outdated results.
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: For matching words is PHP or mySQL faster?
10 minutes for 10,000 iterations seems like way too long. You are probably doing something very inefficient.
Post your code.
Post your code.
-
enhancedrouting
- Forum Newbie
- Posts: 3
- Joined: Tue Mar 01, 2011 12:44 pm
Re: For matching words is PHP or mySQL faster?
Here's the relavent section:
I am doing a social research survey. I am matching words in the survey, with words in a piece of sample data. The above method works, but it takes a really long time.
The above code iterates over every word in my survey, and then compares that word to every word in the sample data. Then it outputs all matches to a multidimensional array with some other statistical information. It works, but takes a long time.
The tables I'm populating $ancwordfrequencyarray and $wordfrequencyarray from look like this:
ID => Word => Frequency
0 => The => 946
1 => and => 515
2 => is => 400
Code: Select all
foreach ($wordfrequencyarray as $topword => $frequency)
{
$antimatch = 1;
foreach($ancwordfrequencyarray as $ancword => $ancfrequency)
{
if($topword === $ancword)
{
$antimatch = 0;
$percent = ($frequency / $counter) * 100;
$ancpercent = ($ancfrequency / $anccounter) * 100;
$difference = $percent / $ancpercent;
$composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
$compcount++;
break 1;
}
}
if ($antimatch == 1)
{
$composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
$compcount++;
}
}The above code iterates over every word in my survey, and then compares that word to every word in the sample data. Then it outputs all matches to a multidimensional array with some other statistical information. It works, but takes a long time.
The tables I'm populating $ancwordfrequencyarray and $wordfrequencyarray from look like this:
ID => Word => Frequency
0 => The => 946
1 => and => 515
2 => is => 400
Last edited by John Cartwright on Tue Mar 01, 2011 3:42 pm, edited 2 times in total.
Reason: Fixed indentation so I could actual read it ;)
Reason: Fixed indentation so I could actual read it ;)
- John Cartwright
- Site Admin
- Posts: 11470
- Joined: Tue Dec 23, 2003 2:10 am
- Location: Toronto
- Contact:
Re: For matching words is PHP or mySQL faster?
EDIT | NVM! I misread your post after re-reading it. I will leave this up here as I think of another solution.
I can't really follow your logic, since I do not know what your variables mean or the array structure. Are you trying to do something like the following?
Which returns something like
I can't really follow your logic, since I do not know what your variables mean or the array structure. Are you trying to do something like the following?
Code: Select all
$frequencyWords = array('the', 'brown', 'fox', 'jumped');
$content = 'The quick brown fox jumped over the fence. The quick green jcart jumped over the wall.';
$words = preg_split('~[^a-z]+~i', strtolower($content), -1, PREG_SPLIT_NO_EMPTY);
$wordCount = array_intersect_key(array_count_values($words), array_flip($frequencyWords));
echo '<pre>'. print_r($wordCount, true) .'</pre>';Array
(
[the] => 4
[brown] => 1
[fox] => 1
[jumped] => 2
)
Re: For matching words is PHP or mySQL faster?
to start with, you don't need inner foreach in this case as you may directly address entries in the second array by the word:
assuming $ancwordfrequencyarray is a big array (in order of tens of thousands) this change alone should speed up things several thousands times, which would likely be enough for your application.
Code: Select all
foreach ($wordfrequencyarray as $topword => $frequency)
{
$antimatch = 1;
if(array_key_exists($topword, $ancwordfrequencyarray))
{
$ancfrequency = $ancwordfrequencyarray[$topword];
$ancword = $topword;
$antimatch = 0;
$percent = ($frequency / $counter) * 100;
$ancpercent = ($ancfrequency / $anccounter) * 100;
$difference = $percent / $ancpercent;
$composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
$compcount++;
}
if ($antimatch == 1)
{
$composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
$compcount++;
}
}
- AbraCadaver
- DevNet Master
- Posts: 2572
- Joined: Mon Feb 24, 2003 10:12 am
- Location: The Republic of Texas
- Contact:
Re: For matching words is PHP or mySQL faster?
Well crap Weirdan beat me too it, this is fairly similar, and yes it will save a lot of time. I have to take off, but I might come back to this one for some fun.
FYI... I don't see where $counter is defined and I don't see you using $compcount. And really no need for the $antimatch.
Code: Select all
foreach ($wordfrequencyarray as $topword => $frequency)
{
$percent = ($frequency / $counter) * 100;
if(isset($ancwordfrequencyarray[$topword]))
{
$ancfrequency = $ancwordfrequencyarray[$topword];
$ancpercent = ($ancfrequency / $anccounter) * 100;
$difference = $percent / $ancpercent;
$composite[] = array($topword,$frequency,$percent,$ancfrequency,$ancpercent,$difference);
} else {
$composite[] = array($topword,$frequency,$percent,$nomatchnum,$nomatchnum,$nomatchalpha);
}
$compcount++;
}mysql_function(): WARNING: This extension is deprecated as of PHP 5.5.0, and will be removed in the future. Instead, the MySQLi or PDO_MySQLextension should be used. See also MySQL: choosing an API guide and related FAQ for more information.
-
enhancedrouting
- Forum Newbie
- Posts: 3
- Joined: Tue Mar 01, 2011 12:44 pm
Re: For matching words is PHP or mySQL faster?
In the future, if I ask another question, I'll post the complete program, rather than the snippet I'm having trouble with. I apologize for the lack of clarity not doing so in the first place caused.
The solutions posted by Weirdan and AbraCadaver both worked. My program now executes in <30 seconds, and a fair amount of the remaining lag is a table sort script trying to processes a 1meg html output file. This is awesome. Thanks!
The solutions posted by Weirdan and AbraCadaver both worked. My program now executes in <30 seconds, and a fair amount of the remaining lag is a table sort script trying to processes a 1meg html output file. This is awesome. Thanks!