Page 3 of 4
Posted: Mon Mar 08, 2004 6:44 am
by m3mn0n
lol
Posted: Mon Mar 08, 2004 6:45 am
by JayBird
Slightly OT, but interesting none-the-less
Question
What parameters, if any, limit the number of different words available to us in English (or any other language)? Are we near to running out of words?
Jonathan Cope , London
Answers
The existence of polyglots suggests that the average person is far from running out of storage space in the brain. And while there are major structural constraints on word numbers, they too leave lots of room for expansion.
First, there is some effect from the number of distinct sounds (phonemes) in a language. Languages with fewer phonemes have some tendency to have longer words.
English has about 40 phonemes and many short words, while Hawaiian, with only 13 phonemes, has many words of three or four syllables. Any language can increase the potential size of its vocabulary by using longer words.
Much more limiting than the number of phonemes are a language's phonotactic patterns--the constraints on possible sequences of phonemes. In English, words can begin with sequences like "sp" or "st", but in Spanish they cannot hence the vowel at the beginning of Espaol. In Greek, words can begin with sequences like "pn" or "ps", but in English they can't, so we pronounce Greek loan words like pneumatic and psychology without the p.
Even so, we are a long way from using up all the permitted shapes: AIDS, fax and ROM are all recent additions, but "snizz", "whask" and literally thousands of other possible English sequences are all unused.
D. Ladd , Department of Linguistics Edinburgh University
There is no sensible limit, in theory, to the number of possible words in a language, although the constituents which make up words (the sounds and syllables) are indeed strictly limited. The remarkable thing about language is that it makes infinite use of these finite means.
Take British English as an example. There are only 44 contrasting sound units (phonemes) in that dialect--consonants and vowels such as /p/, /t/, /k/, /e/, and /i/, which can combine in certain ways to make different words, such as /pit/, /pet/, /kip/, /pik/ (conventionally spelled pick), and so on. A large number of possibilities suggest themselves, therefore, but all languages have phonotactic rules limiting the ways these units combine. For example, in English we can have words beginning with the phoneme /h/, but there are none ending with it, or words ending with the /ng/ sound, but none beginning with them. There are only about 300 vowel plus consonant combinations making up the syllables of English--the most complex consisting of three consonants at the beginning of a syllable (in such words as string) and four at the end (in such words as twelfth).
With these limited resources, English then makes up words of increasing complexity--of two syllables (butter), three (discover), four (publication), five (innumerable), and so on. And so on? There are long words in English, all children know antidisestablishmentarianism, and the syllabic length grows significantly when we take compound items into account, such as science terms--neurolymphomatosis, deoxyribonucleic, and the like (if DNA was given in its fully explicit form, it is said to be more than 200 000 letters in length).
Therefore, there is no theoretical limit. Whatever you think is the longest word in the language, I can always make it longer by adding another element--an extra prefix, such as anti- or non-, or an extra element to make a new compound. Whether these words make any real sense is another matter. In practical terms, we just don't need so many words and we're also nowhere near running out of words.
So the number of actual words in English is relatively quite small. There are some half a million words recorded in the Oxford English Dictionary, and a similar number in Webster's Third New International Dictionary. However, the two books do not contain exactly the same words--many British dialect words do not appear in the US book (Webster's) and vice versa. There is as yet no "super dictionary" which includes all the words in English, including all dialect, slang, and specialised words. And when we reflect on the way in which English is spreading around the world, in the process borrowing thousands of words from other languages in such places as India, South Africa and Malaysia, it is obvious that keeping up with the vocabulary of the language is an enormous task. So nobody knows exactly how many words there are in English, although there are at least a million.
The only real limiting factor on the growth of a language's vocabulary is the power of the human imagination. People invent words all the time, although not all of them actually get into the standard language. A few years ago, on a BBC Radio 4 programme, I ran a competition in which listeners were asked to invent words to express concepts of importance to them. The winner was the word we need to express the feeling we have when we are at an airport waiting for our luggage to appear on the carousel, and everyone else's luggage is appearing except ours: we chose "bagonize".
David Crystal , Editor, The Cambridge Encyclopedia of Language Anglesey
Assuming that we strictly limit ourselves to only consonant-vowel-consonant forms and do not include such extras as tones and stress, the following provides a generous lower boundary to the number of words available in English.
There are more than 50 possible initial consonants (including combinations such as "tr" and "sk"). There are more than 10 distinct vowel sounds (consider: rad, raid, red, rid, ride, rude, rod, reed, road and add in nonwords such as roid (void) and rould (should).
There are more than 40 terminal consonants (including "rt" and "lk"). This means that there are in excess of 20 000 (50 x 10 x 40) single syllables. If we limit ourselves to using only two syllables per word we would still have more than 400 million words to play with.
Francis Glassborow , Oxford
Posted: Mon Mar 08, 2004 6:46 am
by malcolmboston
well, this was fascinating, thanks with the script as well, really clarified our theory, now take your expertise
here 
Posted: Mon Mar 08, 2004 6:49 am
by malcolmboston
in response to bech100
there is always the option that someone will do ImAC0d3R as there password which obviously is nigh on impossible to reverse engineer, with my previous experiences with computers and there 'users' you would be amazed mate at the amount of time people have tehre password as "password", the way me and sami have 'introduced' would find the vast majority of passwords real value
Posted: Mon Mar 08, 2004 6:52 am
by malcolmboston
Bechs post wrote:If we limit ourselves to using only two syllables per word we would still have more than 400 million words to play with.
still this would only take it around
360/370 minutes (which is around 6 hours) to crack
Posted: Mon Mar 08, 2004 11:46 am
by Drachlen
Even though the chances are pretty slim of someone actually doing this successfully, you could try encrypting your passwords differently:
Code: Select all
<?php
$password = "password";
echo 'Old Password: '.md5($password).'<BR>';
echo 'New Password: '.md5(md5("Suddenly the hacker needs an entirely new dictionary.".$password.md5($password)));
?>
As long as your form of encrypting isn't figured out, they have absolutely no chance at getting anything.
Posted: Mon Mar 08, 2004 2:03 pm
by d3ad1ysp0rk
md5 uses key encryption, not sure the size of the keys, but it's good enough.
Say you're using key5, and you type in "My dog"
maybe it'll come out as: 37885sdafd9874huoff4
but you type "My dog" with a key of 7, and it comes out as:
8734y54fnd4384nfjkre
two totally different values, because they were encrypted using an algorythm which uses a key, the best way to do it, the only secure way.
Anyone who proposes a encryption algorythm that uses just text transformations, such as this:
http://lps.no-ip.org/test.php?str=test
would be laughed at in the crypto world, it's just so unsecure that a bruce force program would take less than a day to crack it.
Posted: Mon Mar 08, 2004 3:45 pm
by Roja
You don't just use md5 on the password.
viewtopic.php?p=79397#79397
You want to have a time-based session auth as well. That way it can't be replayed.
However, yes, you are correct - you can in fact build an md5 dictionary.
Doing so gets very large, very fast - and the hope is that you
have implemented some form of "five-wrong-attempts-and-block-the-ip" coding.
Posted: Tue Mar 09, 2004 6:44 pm
by llanitedave
malcolmboston wrote:personally common sense tells me that my MD5 dictionary method should theoretically work, if i can find a list of every word or all a huge amount of common english words then im gonna build it and experiment even though i know that it should work, lol but im not getting out my dictionary to start inputting words
now on to my next question that sami raised into my head
how exactly might one hack into my database now obviously i use both LAMP and WAMP and ive always wondered how exactly someone would in theory at least hack directly into the actual database?
like sami said to stop hackers you muct learn to be one
It would still break down if anyone used a number or some combination of words -- a passphrase, instead of a dictionary word. Everyone will tell you that it's a really bad idea to use a word you can find in the dictionary for your password.
Posted: Wed Mar 10, 2004 3:02 am
by malcolmboston
i know i stated that several times
however like i also said, from my days as technical support most peoples passwords are standard words
me earlier wrote:
there is always the option that someone will do ImAC0d3R as there password which obviously is nigh on impossible to reverse engineer, with my previous experiences with computers and there 'users' you would be amazed mate at the amount of time people have tehre password as "password", the way me and sami have 'introduced' would find the vast majority of passwords real value
Posted: Wed Mar 10, 2004 3:10 am
by JayBird
you could force users to use and alphanumeric password. Like anynumber of letters + 2 numbers.
Mark
Posted: Wed Mar 10, 2004 3:12 am
by malcolmboston
lol everyones being so damn picky
me and sami just wanted to see if it could be done
Posted: Wed Mar 10, 2004 3:55 am
by llanitedave
There are probably a number of near-unbreakable algorithms out there. The one I'm putting together goes something like this:
1. User requests login page
2. Server gets the current time, creates an md5 hash from it, and sends it as a cookie with the page, while also storing it as a variable.
3. User enters username and password. Javascript creates an md5 hash of the password, concantenates the result with the cookie hash, and generates an md5 hash of the combination, then submits it to the server.
4. Server receives the username and hashed password data, retrieves username and pre-hashed password from the database, concantenates the hashed password and stored variable, and hashes the combination.
5. server compares hashed values arriving from client with its own.
Even if somebody has a line sniffer, it won't help them much, because it wil change with every login.
Of course, if they've already got spyware logging your keystrokes, all bets are off.
Can anybody find any other weaknesses with the idea?
Posted: Wed Mar 10, 2004 4:05 am
by JayBird
the operative word as always in the situation is "near-unbreakable"
Posted: Wed Mar 10, 2004 4:12 am
by malcolmboston
elite wrote:
1. User requests login page
2. Server gets the current time, creates an md5 hash from it, and sends it as a cookie with the page, while also storing it as a variable.
3. User enters username and password. Javascript creates an md5 hash of the password, concantenates the result with the cookie hash, and generates an md5 hash of the combination, then submits it to the server.
4. Server receives the username and hashed password data, retrieves username and pre-hashed password from the database, concantenates the hashed password and stored variable, and hashes the combination.
5. server compares hashed values arriving from client with its own.
needless to say that would never ever get hacked
bech wrote:
Of course, if they've already got spyware logging your keystrokes, all bets are off.
too true, thats what happened to valve/sierra with half-life 2