I don't make a lot of topics, so I'm not sure whether this should be in PHP or Databases since it's a little of both.
Anyway, I'm writing a blog that I'm going to be constantly adding features to. One of them that just came into mind was adding a built-in spellchecker. However, I'm worried about performance issues.
The way that it would be set up is that after I make a post, it'd show me a preview of my post with everything parsed and such. Then, it'd run through every word (I'm considering omitting posted code) and check if it's in the dictionary. If it isn't, the word would turn into a link. If I click on the link, I'd be verifying that the word was spelled correctly and that it should be added to the dictionary. Otherwise, I'd go back to the edit screen and change it. Eventually, this would fill the dictionary up with a good amount of words, including words that aren't in the actual dictionary, but are still correctly spelled.
So, my dilemma is the performance of this. Inserting in the database could be done by storing all confirmed words into a session variable and, upon submit, adding all of those words through a single query. But that's not the part that I'm worried about. The part I'm concerned about is the checking process.
Each word that I check would call a query and try to find itself in the database. This hardly seems efficient. The best thing I've thought of to do this is to str_replace all of the same word out of the content as it's checked, but that still doesn't seem to cut it.
Any ideas?
Database Dictionary
Moderator: General Moderators
- feyd
- Neighborhood Spidermoddy
- Posts: 31559
- Joined: Mon Mar 29, 2004 3:24 pm
- Location: Bothell, Washington, USA
Extract all the words into an array and compact the array to unique words. You can run this array through soundex() (or not) and eventually pass the array to the database using an IN() clause.
array_unique(), array_map()
array_unique(), array_map()
- superdezign
- DevNet Master
- Posts: 4135
- Joined: Sat Jan 20, 2007 11:06 pm
I'm with you up until the IN() keyword. I was under the impression that IN was used for things such as finding if a variable existed in a list that I provide, rather than if something from the list that I provide is in the database.
Do you think you could give me a syntax example of using IN in this case?
Do you think you could give me a syntax example of using IN in this case?
Here's a simple example of the use of IN in SQL:
Documentation from MySQL:
http://dev.mysql.com/doc/refman/5.0/en/ ... unction_in
Hope this helps!
Code: Select all
SELECT * FROM dictionary WHERE word IN ('hello', 'world')http://dev.mysql.com/doc/refman/5.0/en/ ... unction_in
Hope this helps!
- CoderGoblin
- DevNet Resident
- Posts: 1425
- Joined: Tue Mar 16, 2004 10:03 am
- Location: Aachen, Germany
Any reason not to use pspell rather than a database ? I know one is if your provider doesn't have it but if it does....
Also beware of soundex. Doesn't work for some other languages (i.e german)
The following code is one I knocked up a while ago as a quick and dirty demo of pspell... I know it has some problems but could be useful. Will look at it tonight if I get the time if anyone is interested and then unfortunately I'm not available for a week.
Also beware of soundex. Doesn't work for some other languages (i.e german)
The following code is one I knocked up a while ago as a quick and dirty demo of pspell... I know it has some problems but could be useful. Will look at it tonight if I get the time if anyone is interested and then unfortunately I'm not available for a week.
Code: Select all
<?php
//ini_set('display_errors', 1);
//error_reporting((int) 8191);
mb_internal_encoding("UTF-8"); // not sure if needed to regex check.
date_default_timezone_set('Europe/Berlin');
$suggestions=array();
if ($_REQUEST['mystr']) {
$mystr = $_REQUEST['mystr'];
} else {
$mystr = "this is my test to spell chack. Yet another spell check ";
}
$output='<p id="correction">'.$mystr.'</p>';
$pspell_config = pspell_config_create("de");
if ($_REQUEST['mit_cmp'] == 1) {
pspell_config_runtogether($pspell_config, true);
}
pspell_config_mode($pspell_config, PSPELL_FAST);
$pspell_link = pspell_new_config($pspell_config);
preg_match_all("/\b\w+\b/",$mystr,$words);
$count=0;
foreach ($words[0] as $val) {
if (!preg_match('/[0-9]/',$val)) {
if (!pspell_check($pspell_link, $val)) {
$suggestions[$val] = pspell_suggest($pspell_link, $val);
$output=preg_replace("/\b$val\b/","<span id=\"kor_".$count++."\" style=\"color:#F00\" onclick=\"javascript:suggest('$count');\">
}
}
}
if (count($suggestions)) {
foreach($suggestions as $key=>$values) {
$suggest.="<div>\n <select id=\"sug_$key\" name=\"spellsug_$key\" style=\"display:none;\">";
if (count($values)) {
$suggest.='\n <option>'.implode('</option>\n <option>',$values).'</option>';
} $suggest.=" </select>\n </div>\n";
}
}
?>
<html>
<head>
<script type="text/javascript">
var spell_current_suggest;
var spell_need_correction = [];
var current_spell_pos=-1;
var spell_remain=0;
var base;
function spellSuggest(txt)
{
if (spell_current_suggest) {
spell_current_suggest.style.display='none';
}
spell_current_suggest=document.getElementById('sug_'+txt);
if (spell_current_suggest) {
spell_current_suggest.style.display='block';
}
}
function spellChange()
{
if (spell_current_suggest.value != '') {
new_text_node=document.createTextNode(spell_current_suggest.value);
base.replaceChild(new_text_node,spell_need_correction[current_spell_pos]);
}
base.normalize();
spell_remain--;
spellNext();
}
function spellChangeAll()
{
var text=spellGetText(current_spell_pos);
var remain=0;
if (spell_current_suggest.value != '') {
new_text_node=document.createTextNode(spell_current_suggest.value);
for (var i=current_spell_pos; i< spell_need_correction.length; i++) {
if (spellGetText(i) == text) {
base.replaceChild(new_text_node,spell_need_correction[i]);
spell_remain--;
}
}
}
spellNext();
}
function spellGetText(pos)
{
var range_item=document.createRange();
range_item.selectNode(spell_need_correction[pos].childNodes[0]);
return range_item.toString();
}
function spellIgnore()
{
spell_remain--;
spellNext();
}
function spellNext()
{
if (spell_remain == 0) {
document.getElementById('spellignore').disabled=true;
document.getElementById('spellchange').disabled=true;
document.getElementById('spellall').disabled=true;
if (spell_current_suggest) {
spell_current_suggest.disabled=true;
}
} else {
if (current_spell_pos >= 0) {
spell_need_correction[current_spell_pos].style.backgroundColor = '';
}
current_spell_pos++;
spell_need_correction[current_spell_pos].style.backgroundColor = '#FDD';
spellSuggest(spellGetText(current_spell_pos));
}
return false;
}
window.onload = function() {
base=document.getElementById('correction');
if (base.hasChildNodes) {
for (var i=0; i< base.childNodes.length; i++) {
if (base.childNodes[i].nodeType==1) {
if (base.childNodes[i].nodeName=='SPAN') {
spell_need_correction.push(base.childNodes[i]);
}
}
}
}
spell_remain=spell_need_correction.length;
spellNext();
}
</script>
</head>
<body>
<h1>Spelling</h1>
<h2>Original</h2>
<p>
<form name="spell" method="post">
<input name="mit_cmp" type="checkbox" value=1> Compound Words<br />
<textarea name="mystr" rows=10 cols=80><?php echo($mystr); ?></textarea>
<input type="submit" value="Check"/>
</form>
</p>
<h2>Correction</h2>
<table>
<tr style="vertical-align:top;">
<td width="400">
<?php echo($output); ?>
</td>
<td>
<?php echo($suggest); ?>
</td>
<td>
<input type="submit" id="spellignore" value="Ignore" onclick="spellIgnore();"><br />
<input type="submit" id="spellchange" value="Change" onclick="spellChange();" /><br />
<input type="submit" id="spellall" value="All" onclick="spellChangeAll();"/>
</td>
</tr>
</table>
</body>
</html>- superdezign
- DevNet Master
- Posts: 4135
- Joined: Sat Jan 20, 2007 11:06 pm
Very interesting. I didn't even consider making suggestions... I was going to do a basic spellcheck like in Firefox, but I think I'll do a suggestion thing later on, too. ^_^
And, just a small suggestion in the code... instead of if(count($suggestions)), use if(!empty($suggestions)) just in case $suggestions is never set.
Edit: Thanks alimadzi. ^_^ That works nicely.
And, just a small suggestion in the code... instead of if(count($suggestions)), use if(!empty($suggestions)) just in case $suggestions is never set.
Edit: Thanks alimadzi. ^_^ That works nicely.
- CoderGoblin
- DevNet Resident
- Posts: 1425
- Joined: Tue Mar 16, 2004 10:03 am
- Location: Aachen, Germany
As previously stated, the whole thing was thrown together very quickly as a quick demo of pspell rather than as a true working thing. It also needs a check for how many words are found before the foreach and I seem to remember the correct all or something doesn't work 100%. To "productionize" it, it would need to be usable without javascript as well.superdezign wrote:Very interesting. I didn't even consider making suggestions... I was going to do a basic spellcheck like in Firefox, but I think I'll do a suggestion thing later on, too.