Page 1 of 1

Removing duplicate strings

Posted: Wed Nov 02, 2005 5:56 pm
by HiddenS3crets
I've got a 17 mb dictionary file that has some duplicate words in it. Is there a way to return everything unique?

I thought about loading the file into an array, then using array_unique() to return it. Should I use this way, or is there a more efficient way to?

Posted: Wed Nov 02, 2005 5:58 pm
by John Cartwright
Not sure how to do this strictly in mysql, but with a php aproach you almost got it.
Simple gather your array of all your words, slap it with array_unique, delete your old table, input your fresh unique array.
Remember, this process only needs to be done once.

Posted: Wed Nov 02, 2005 6:00 pm
by HiddenS3crets
Jcart wrote:Not sure how to do this strictly in mysql, but with a php aproach you almost got it.
Simple gather your array of all your words, slap it with array_unique, delete your old table, input your fresh unique array.
Remember, this process only needs to be done once.
I'm not using MySQL, my word file is uploaded as a text file on the server. I need to get the contents and store into an array... that should still work though, right?

Posted: Wed Nov 02, 2005 6:09 pm
by timvw
Well, if you have a couple of GNU textutilities around:

Code: Select all

timvw@madoka:~$ sort words.txt | uniq > uniqwords.txt

Posted: Wed Nov 02, 2005 6:17 pm
by HiddenS3crets
I'm not great working with files in PHP, how would I add each line from a file to an array?

Posted: Wed Nov 02, 2005 6:24 pm
by feyd
file() ... should use array_map() to trim() all the elements so you can work from a uniform set of entries..