Page 1 of 1
Database Dictionary
Posted: Thu Jun 14, 2007 8:04 am
by superdezign
I don't make a lot of topics, so I'm not sure whether this should be in PHP or Databases since it's a little of both.
Anyway, I'm writing a blog that I'm going to be constantly adding features to. One of them that just came into mind was adding a built-in spellchecker. However, I'm worried about performance issues.
The way that it would be set up is that after I make a post, it'd show me a preview of my post with everything parsed and such. Then, it'd run through every word (I'm considering omitting posted code) and check if it's in the dictionary. If it isn't, the word would turn into a link. If I click on the link, I'd be verifying that the word was spelled correctly and that it should be added to the dictionary. Otherwise, I'd go back to the edit screen and change it. Eventually, this would fill the dictionary up with a good amount of words, including words that aren't in the actual dictionary, but are still correctly spelled.
So, my dilemma is the performance of this. Inserting in the database could be done by storing all confirmed words into a session variable and, upon submit, adding all of those words through a single query. But that's not the part that I'm worried about. The part I'm concerned about is the checking process.
Each word that I check would call a query and try to find itself in the database. This hardly seems efficient. The best thing I've thought of to do this is to str_replace all of the same word out of the content as it's checked, but that still doesn't seem to cut it.
Any ideas?
Posted: Thu Jun 14, 2007 8:15 am
by feyd
Extract all the words into an array and compact the array to unique words. You can run this array through
soundex() (or not) and eventually pass the array to the database using an IN() clause.
array_unique(),
array_map()
Posted: Thu Jun 14, 2007 8:26 am
by superdezign
I'm with you up until the IN() keyword. I was under the impression that IN was used for things such as finding if a variable existed in a list that I provide, rather than if something from the list that I provide is in the database.
Do you think you could give me a syntax example of using IN in this case?
Posted: Thu Jun 14, 2007 9:32 am
by alimadzi
Here's a simple example of the use of IN in SQL:
Code: Select all
SELECT * FROM dictionary WHERE word IN ('hello', 'world')
Documentation from MySQL:
http://dev.mysql.com/doc/refman/5.0/en/ ... unction_in
Hope this helps!
Posted: Thu Jun 14, 2007 10:14 am
by CoderGoblin
Any reason not to use pspell rather than a database ? I know one is if your provider doesn't have it but if it does....
Also beware of soundex. Doesn't work for some other languages (i.e german)
The following code is one I knocked up a while ago as a quick and dirty demo of pspell... I know it has some problems but could be useful. Will look at it tonight if I get the time if anyone is interested and then unfortunately I'm not available for a week.
Code: Select all
<?php
//ini_set('display_errors', 1);
//error_reporting((int) 8191);
mb_internal_encoding("UTF-8"); // not sure if needed to regex check.
date_default_timezone_set('Europe/Berlin');
$suggestions=array();
if ($_REQUEST['mystr']) {
$mystr = $_REQUEST['mystr'];
} else {
$mystr = "this is my test to spell chack. Yet another spell check ";
}
$output='<p id="correction">'.$mystr.'</p>';
$pspell_config = pspell_config_create("de");
if ($_REQUEST['mit_cmp'] == 1) {
pspell_config_runtogether($pspell_config, true);
}
pspell_config_mode($pspell_config, PSPELL_FAST);
$pspell_link = pspell_new_config($pspell_config);
preg_match_all("/\b\w+\b/",$mystr,$words);
$count=0;
foreach ($words[0] as $val) {
if (!preg_match('/[0-9]/',$val)) {
if (!pspell_check($pspell_link, $val)) {
$suggestions[$val] = pspell_suggest($pspell_link, $val);
$output=preg_replace("/\b$val\b/","<span id=\"kor_".$count++."\" style=\"color:#F00\" onclick=\"javascript:suggest('$count');\">
}
}
}
if (count($suggestions)) {
foreach($suggestions as $key=>$values) {
$suggest.="<div>\n <select id=\"sug_$key\" name=\"spellsug_$key\" style=\"display:none;\">";
if (count($values)) {
$suggest.='\n <option>'.implode('</option>\n <option>',$values).'</option>';
} $suggest.=" </select>\n </div>\n";
}
}
?>
<html>
<head>
<script type="text/javascript">
var spell_current_suggest;
var spell_need_correction = [];
var current_spell_pos=-1;
var spell_remain=0;
var base;
function spellSuggest(txt)
{
if (spell_current_suggest) {
spell_current_suggest.style.display='none';
}
spell_current_suggest=document.getElementById('sug_'+txt);
if (spell_current_suggest) {
spell_current_suggest.style.display='block';
}
}
function spellChange()
{
if (spell_current_suggest.value != '') {
new_text_node=document.createTextNode(spell_current_suggest.value);
base.replaceChild(new_text_node,spell_need_correction[current_spell_pos]);
}
base.normalize();
spell_remain--;
spellNext();
}
function spellChangeAll()
{
var text=spellGetText(current_spell_pos);
var remain=0;
if (spell_current_suggest.value != '') {
new_text_node=document.createTextNode(spell_current_suggest.value);
for (var i=current_spell_pos; i< spell_need_correction.length; i++) {
if (spellGetText(i) == text) {
base.replaceChild(new_text_node,spell_need_correction[i]);
spell_remain--;
}
}
}
spellNext();
}
function spellGetText(pos)
{
var range_item=document.createRange();
range_item.selectNode(spell_need_correction[pos].childNodes[0]);
return range_item.toString();
}
function spellIgnore()
{
spell_remain--;
spellNext();
}
function spellNext()
{
if (spell_remain == 0) {
document.getElementById('spellignore').disabled=true;
document.getElementById('spellchange').disabled=true;
document.getElementById('spellall').disabled=true;
if (spell_current_suggest) {
spell_current_suggest.disabled=true;
}
} else {
if (current_spell_pos >= 0) {
spell_need_correction[current_spell_pos].style.backgroundColor = '';
}
current_spell_pos++;
spell_need_correction[current_spell_pos].style.backgroundColor = '#FDD';
spellSuggest(spellGetText(current_spell_pos));
}
return false;
}
window.onload = function() {
base=document.getElementById('correction');
if (base.hasChildNodes) {
for (var i=0; i< base.childNodes.length; i++) {
if (base.childNodes[i].nodeType==1) {
if (base.childNodes[i].nodeName=='SPAN') {
spell_need_correction.push(base.childNodes[i]);
}
}
}
}
spell_remain=spell_need_correction.length;
spellNext();
}
</script>
</head>
<body>
<h1>Spelling</h1>
<h2>Original</h2>
<p>
<form name="spell" method="post">
<input name="mit_cmp" type="checkbox" value=1> Compound Words<br />
<textarea name="mystr" rows=10 cols=80><?php echo($mystr); ?></textarea>
<input type="submit" value="Check"/>
</form>
</p>
<h2>Correction</h2>
<table>
<tr style="vertical-align:top;">
<td width="400">
<?php echo($output); ?>
</td>
<td>
<?php echo($suggest); ?>
</td>
<td>
<input type="submit" id="spellignore" value="Ignore" onclick="spellIgnore();"><br />
<input type="submit" id="spellchange" value="Change" onclick="spellChange();" /><br />
<input type="submit" id="spellall" value="All" onclick="spellChangeAll();"/>
</td>
</tr>
</table>
</body>
</html>
Posted: Thu Jun 14, 2007 11:00 am
by superdezign
Very interesting. I didn't even consider making suggestions... I was going to do a basic spellcheck like in Firefox, but I think I'll do a suggestion thing later on, too. ^_^
And, just a small suggestion in the code... instead of if(count($suggestions)), use if(!empty($suggestions)) just in case $suggestions is never set.
Edit: Thanks alimadzi. ^_^ That works nicely.
Posted: Thu Jun 14, 2007 12:36 pm
by CoderGoblin
superdezign wrote:Very interesting. I didn't even consider making suggestions... I was going to do a basic spellcheck like in Firefox, but I think I'll do a suggestion thing later on, too.
As previously stated, the whole thing was thrown together very quickly as a quick demo of pspell rather than as a true working thing. It also needs a check for how many words are found before the foreach and I seem to remember the correct all or something doesn't work 100%. To "productionize" it, it would need to be usable without javascript as well.

Main point of posting the code was to give an example people could build from.