Anagram Searching

Not for 'how-to' coding questions but PHP theory instead, this forum is here for those of us who wish to learn about design aspects of programming with PHP.

Moderator: General Moderators

Post Reply
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Anagram Searching

Post by JayBird »

I have a DB that has around 250,000 english words in it.

I was thinking the other day how to create a script that would allow a user to enter some characters i.e. "servant"

It would then search the database and display other words, that can be made up of the same letters, i.e. for "servant" it could return "taverns".

Then, also you could have options to return all words that can be made from them letters, with a minimum length of 3 letters

e.g.

ant
tan
rent
rest

At first i thought it would be easy, then i started thinking about it and my brain melted! :lol:

Any ideas?
magicrobotmonkey
Forum Regular
Posts: 888
Joined: Sun Mar 21, 2004 1:09 pm
Location: Cambridge, MA

Post by magicrobotmonkey »

brute force it:

Code: Select all

//if its in the db:
$query = "select wordText from dict order by CHAR_LENGTH(wordText), wordText

while($thisDictWord = $res)
{
    if(count(array_intersect(str_split($testWord), str_spilt($thisDictWord)) >0))
        $anagramsї] = $thisDictWord;

}
you could also play with similar_text()
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

I would find it easier to find anagrams of the same length. As it reduces the brute force approach a lot.

I think the better approach would be to create a list of possible words and get the rows with given a very long where clause.

So basically

anagram
aangram
angaram
.
.

Code: Select all

Select * from dict Where wordText IN (anagram, aangram, angaram ...)

much less system load doing it this way. And make sure you got an index on wordText



By the way: Where did you get that database? Is it downloadable anywhere open source?
User avatar
JayBird
Admin
Posts: 4524
Joined: Wed Aug 13, 2003 7:02 am
Location: York, UK
Contact:

Post by JayBird »

AGISB wrote:By the way: Where did you get that database? Is it downloadable anywhere open source?
I have just uploaded a zip file containing 12 text files


compound_forms-wordlist-76,205-words.txt
consolidated list.txt
FILE main list -- 103,070 entries.txt
Moby compound forms -- 256,772 entries.txt
Moby single words -- 354,984 entries.txt
Orchy list -- 175, 171 entries.txt
Roger King's dictionary -- 191,625 entries.txt
Roget's Thesaurus list -- 17,474 entries.txt
Shorter Oxford English Dict -- 119,995 entries.txt
UK ACD, no accents -- 230,534 entries.txt
Unix dictionary -- 25,104 entries.txt
wordlist-234,936-words.txt


The file is currenltly uplaoding, it should be 8.18 MB (8,578,681 bytes)

http://www.infinit-e.com/wordlists.zip

Mark
User avatar
onion2k
Jedi Mod
Posts: 5263
Joined: Tue Dec 21, 2004 5:03 pm
Location: usrlab.com

Re: Anagram Searching

Post by onion2k »

Bech100 wrote:I have a DB that has around 250,000 english words in it.

I was thinking the other day how to create a script that would allow a user to enter some characters i.e. "servant"

It would then search the database and display other words, that can be made up of the same letters, i.e. for "servant" it could return "taverns".
This is actually really simple so long as you process everything beforehand. Make a script that inserts all the words into a database table.. and also inserts another string thats made up of the letters in the word in alphabetical order. So you'd have:

servant | aenrstv
taverns | aenrstv

Finding an anagram would just be a matter of taking the lookup word, converting it into alphabetical letter order, and pulling out all the entries that match.
Then, also you could have options to return all words that can be made from them letters, with a minimum length of 3 letters

e.g.

ant
tan
rent
rest

At first i thought it would be easy, then i started thinking about it and my brain melted! :lol:

Any ideas?
Again, using the alphabet order index words, you'd have to go through all the possible 3 letter combinations, then 4 letter, 5 letter, and so on.

Theres probably better ways to index the words. That sort of thing is always a challenge..
AGISB
Forum Contributor
Posts: 422
Joined: Fri Jul 09, 2004 1:23 am

Post by AGISB »

Thanks Mark. Downloading now.
timvw
DevNet Master
Posts: 4897
Joined: Mon Jan 19, 2004 11:11 pm
Location: Leuven, Belgium

Post by timvw »

building all combination with some input can be easily solved recursively..

untested code

Code: Select all

function buildcombo($chars)
{
      // only one char - thus only one solution
      if (strlen($chars) == 1) return $chars{0};

     $solutions = array();
     for ($i = 1; $i < strlen($chars); ++$i)
     &#123;
           $others = substr($chars, 0, $i);
            $others .= substr($chars, $i+1, strlen($chars));

            $subsolutions = buildcombo($others);

            foreach($subsolutions as $subsolution)
                    $solutions&#1111;] = $chars&#123;$i&#125; . $subsolution;
     &#125;
     return $solutions;
&#125;
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

should be able to do it with a regular expression pretty easy.. and no recursion needed.
Post Reply