Database Dictionary

PHP programming forum. Ask questions or help people concerning PHP code. Don't understand a function? Need help implementing a class? Don't understand a class? Here is where to ask. Remember to do your homework!

Moderator: General Moderators

Post Reply
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Database Dictionary

Post by superdezign »

I don't make a lot of topics, so I'm not sure whether this should be in PHP or Databases since it's a little of both.

Anyway, I'm writing a blog that I'm going to be constantly adding features to. One of them that just came into mind was adding a built-in spellchecker. However, I'm worried about performance issues.


The way that it would be set up is that after I make a post, it'd show me a preview of my post with everything parsed and such. Then, it'd run through every word (I'm considering omitting posted code) and check if it's in the dictionary. If it isn't, the word would turn into a link. If I click on the link, I'd be verifying that the word was spelled correctly and that it should be added to the dictionary. Otherwise, I'd go back to the edit screen and change it. Eventually, this would fill the dictionary up with a good amount of words, including words that aren't in the actual dictionary, but are still correctly spelled.

So, my dilemma is the performance of this. Inserting in the database could be done by storing all confirmed words into a session variable and, upon submit, adding all of those words through a single query. But that's not the part that I'm worried about. The part I'm concerned about is the checking process.


Each word that I check would call a query and try to find itself in the database. This hardly seems efficient. The best thing I've thought of to do this is to str_replace all of the same word out of the content as it's checked, but that still doesn't seem to cut it.

Any ideas?
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Extract all the words into an array and compact the array to unique words. You can run this array through soundex() (or not) and eventually pass the array to the database using an IN() clause.

array_unique(), array_map()
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

I'm with you up until the IN() keyword. I was under the impression that IN was used for things such as finding if a variable existed in a list that I provide, rather than if something from the list that I provide is in the database.

Do you think you could give me a syntax example of using IN in this case?
User avatar
alimadzi
Forum Newbie
Posts: 9
Joined: Fri Jun 08, 2007 12:57 pm
Location: Boise, Idaho, USA

Post by alimadzi »

Here's a simple example of the use of IN in SQL:

Code: Select all

SELECT * FROM dictionary WHERE word IN ('hello', 'world')
Documentation from MySQL:

http://dev.mysql.com/doc/refman/5.0/en/ ... unction_in

Hope this helps!
User avatar
CoderGoblin
DevNet Resident
Posts: 1425
Joined: Tue Mar 16, 2004 10:03 am
Location: Aachen, Germany

Post by CoderGoblin »

Any reason not to use pspell rather than a database ? I know one is if your provider doesn't have it but if it does....

Also beware of soundex. Doesn't work for some other languages (i.e german)

The following code is one I knocked up a while ago as a quick and dirty demo of pspell... I know it has some problems but could be useful. Will look at it tonight if I get the time if anyone is interested and then unfortunately I'm not available for a week.

Code: Select all

<?php
//ini_set('display_errors', 1);  
//error_reporting((int) 8191);  
mb_internal_encoding("UTF-8"); // not sure if needed to regex check.

date_default_timezone_set('Europe/Berlin');

$suggestions=array();

if ($_REQUEST['mystr']) {
    $mystr = $_REQUEST['mystr'];
} else {
    $mystr = "this is my test to spell chack. Yet another spell check ";
}
$output='<p id="correction">'.$mystr.'</p>';

$pspell_config = pspell_config_create("de");
if ($_REQUEST['mit_cmp'] == 1) {
    pspell_config_runtogether($pspell_config, true);
}
pspell_config_mode($pspell_config, PSPELL_FAST); 
$pspell_link = pspell_new_config($pspell_config);


preg_match_all("/\b\w+\b/",$mystr,$words);
$count=0;
foreach ($words[0] as $val) {
    if (!preg_match('/[0-9]/',$val)) {
        if (!pspell_check($pspell_link, $val))  {
            $suggestions[$val] = pspell_suggest($pspell_link, $val);            
            $output=preg_replace("/\b$val\b/","<span id=\"kor_".$count++."\" style=\"color:#F00\" onclick=\"javascript:suggest('$count');\">
        }
    }
}

if (count($suggestions)) {
    foreach($suggestions as $key=>$values) {
        $suggest.="<div>\n  <select id=\"sug_$key\" name=\"spellsug_$key\" style=\"display:none;\">";
        if (count($values)) {
            $suggest.='\n    <option>'.implode('</option>\n    <option>',$values).'</option>';
        }        $suggest.="  </select>\n  </div>\n";
    }
}
?>
  
<html>
  <head>
    <script type="text/javascript">
        var spell_current_suggest; 
        var spell_need_correction = [];
        var current_spell_pos=-1;
        var spell_remain=0;
        var base;
        
        function spellSuggest(txt)
        {
            if (spell_current_suggest) {
                spell_current_suggest.style.display='none';
            }
            spell_current_suggest=document.getElementById('sug_'+txt);
            if (spell_current_suggest) {
                spell_current_suggest.style.display='block';
            }    
        }
         
        function spellChange()
        {
            if (spell_current_suggest.value != '') {
                new_text_node=document.createTextNode(spell_current_suggest.value);
                base.replaceChild(new_text_node,spell_need_correction[current_spell_pos]);
            }
            base.normalize();
            spell_remain--;    
            spellNext();            
        }
         
        function spellChangeAll()
        {
            var text=spellGetText(current_spell_pos);
            var remain=0;
            if (spell_current_suggest.value != '') {
                new_text_node=document.createTextNode(spell_current_suggest.value);
                for (var i=current_spell_pos; i< spell_need_correction.length; i++) {
                  if (spellGetText(i) == text) {
                    base.replaceChild(new_text_node,spell_need_correction[i]);
                    spell_remain--;
                  }
                }  
            }
            spellNext();
        }
         
        function spellGetText(pos)
        {
          var range_item=document.createRange();
          range_item.selectNode(spell_need_correction[pos].childNodes[0]);
          return range_item.toString();
        }
         
        function spellIgnore()
        {
          spell_remain--;
          spellNext();   
        }
         
        function spellNext()
        { 
          if (spell_remain == 0) {
            document.getElementById('spellignore').disabled=true;
            document.getElementById('spellchange').disabled=true;
            document.getElementById('spellall').disabled=true;   
            if (spell_current_suggest) {
                spell_current_suggest.disabled=true;
            }
          } else {
            if (current_spell_pos >= 0) {
              spell_need_correction[current_spell_pos].style.backgroundColor = '';
            }
            current_spell_pos++;
            spell_need_correction[current_spell_pos].style.backgroundColor = '#FDD';
            spellSuggest(spellGetText(current_spell_pos));          
          }
          return false;          
        }
                
        window.onload = function() {
          base=document.getElementById('correction');
          if (base.hasChildNodes) {
            for (var i=0; i< base.childNodes.length; i++) {
              if (base.childNodes[i].nodeType==1) {
                if (base.childNodes[i].nodeName=='SPAN') {
                  spell_need_correction.push(base.childNodes[i]);
                }
              }  
            }    
          }      
          spell_remain=spell_need_correction.length;
          spellNext();
        }

        
    </script>
  </head>
  <body> 
    <h1>Spelling</h1>
    <h2>Original</h2>
    <p>
      <form name="spell" method="post">
        <input name="mit_cmp" type="checkbox" value=1>  Compound Words<br />
        <textarea name="mystr" rows=10 cols=80><?php echo($mystr); ?></textarea>
        <input type="submit" value="Check"/>
      </form>
    </p>
    <h2>Correction</h2>
    <table>
      <tr style="vertical-align:top;">
        <td width="400">
          <?php echo($output); ?>
        </td>
        <td> 
          <?php echo($suggest); ?>
        </td>
        <td> 
         <input type="submit" id="spellignore" value="Ignore" onclick="spellIgnore();"><br />
         <input type="submit" id="spellchange" value="Change" onclick="spellChange();" /><br />
         <input type="submit" id="spellall" value="All" onclick="spellChangeAll();"/>
        </td>
      </tr>  
    </table> 
  </body>    
</html>
User avatar
superdezign
DevNet Master
Posts: 4135
Joined: Sat Jan 20, 2007 11:06 pm

Post by superdezign »

Very interesting. I didn't even consider making suggestions... I was going to do a basic spellcheck like in Firefox, but I think I'll do a suggestion thing later on, too. ^_^

And, just a small suggestion in the code... instead of if(count($suggestions)), use if(!empty($suggestions)) just in case $suggestions is never set.


Edit: Thanks alimadzi. ^_^ That works nicely.
User avatar
CoderGoblin
DevNet Resident
Posts: 1425
Joined: Tue Mar 16, 2004 10:03 am
Location: Aachen, Germany

Post by CoderGoblin »

superdezign wrote:Very interesting. I didn't even consider making suggestions... I was going to do a basic spellcheck like in Firefox, but I think I'll do a suggestion thing later on, too.
As previously stated, the whole thing was thrown together very quickly as a quick demo of pspell rather than as a true working thing. It also needs a check for how many words are found before the foreach and I seem to remember the correct all or something doesn't work 100%. To "productionize" it, it would need to be usable without javascript as well. :wink: Main point of posting the code was to give an example people could build from.
Post Reply