Nearest Words

Questions about the MySQL, PostgreSQL, and most other databases, as well as using it with PHP can be asked here.

Moderator: General Moderators

Post Reply
newcheese
Forum Newbie
Posts: 2
Joined: Fri Dec 31, 2010 2:30 am

Nearest Words

Post by newcheese »

Hey everyone,

I need to make a "nearest words" function for my site. I am working with PHP and MySQL, and dealing with 100 000+ records. I have a WordID (auto increment) and Word field.

I'll explain what I have so far, which is FAR from ideal:

I have a words table, with a word id, organized in ascending/alphabetical order.

It's quite easy to find the nearest words this way, as the WordID +/- 10 can provide me with the 20 closest words to whatever query.

The problem lies with inserting new data. How can I insert in alphabetical order? Or update my table so that WordIDs stay sequential & Words remain in alphabetical order?

My temporary fix (because I can't come up with anything else, except maybe temp tables..) is to alter the table and order by word ascending, then drop the WordID column, and then re-add it. This meets my requirements but its slow, and it can't be a good idea to be dropping primary keys on the fly like that.

I tried using cursors, but it's also slow.

Any suggestions, ideas, or other ways to approach this problem? I'd love to have something as clean and efficient as the nearest word list urbandictionary has, for example.

Thanks
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Nearest Words

Post by Christopher »

Relational databases do not keep the data in any specific order. Use the the SQL clause 'ORDER BY Word' in your SELECT statements to sort query data. See the MySQL manual for more information.
(#10850)
newcheese
Forum Newbie
Posts: 2
Joined: Fri Dec 31, 2010 2:30 am

Re: Nearest Words

Post by newcheese »

I understand that, but then suppose I have the following list of words:

1 tiger
2 dog
3 elephant
4 kangaroo
5 lion
6 zebra
7 mouse
8 cat

an order by Word ascending would give me:

8 cat
2 dog
3 elephant
4 kangaroo
5 lion
7 mouse
1 tiger
6 zebra

how could I then choose the 4 closest words to lion (by which I mean, elephant, kangaroo, mouse and tiger)?

I can't do WordID +/- 2 because WordID doesn't all of a sudden become sequential after an order by.

This is the problem...
User avatar
Christopher
Site Administrator
Posts: 13596
Joined: Wed Aug 25, 2004 7:54 pm
Location: New York, NY, US

Re: Nearest Words

Post by Christopher »

Are you reading them into an array? If so they become:

0 8 cat
1 2 dog
2 3 elephant
3 4 kangaroo
4 5 lion
5 7 mouse
6 1 tiger
7 6 zebra

Remember that WordID is only a unique key. It is not something for ordering records.
(#10850)
Post Reply