Birthday attacks and hashed passwords

Discussions of secure PHP coding. Security in software is important, so don't be afraid to ask. And when answering: be anal. Nitpick. No security vulnerability is too small.

Moderator: General Moderators

Post Reply
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Birthday attacks and hashed passwords

Post by Ambush Commander »

Could someone please explain to me how birthday attacks have anything to do with hashed passwords?

The birthday attack states that the probability that, in a pool of randomly selected values, there are two values are the same is much higher than most people expect. However, we have no previous knowledge of what these two values are, or what the collision will be. We just know that there's a high chance that there will be one.

However, from the perspective of someone who has a hash and wants to find the password it was generated from, the birthday attack doesn't apply: you already have one half of the pair, whereas a birthday attack requires that you have no preconceptions on the concept. Attacks that let you take the hash and find the message are called preimage attacks, and are mostly non-existent right now.

Thus, while from a cryptography/encryption standpoint the collisions that MD5 and SHA1 have are fairly worrying, they are irrelevant with regards to password obfuscation, and although in the future we may find out that there are indeed preimage attacks against these, with a hearty helping of salt (preferably binary and long) to ward off rainbow tables they are just as secure as SHA-256.

Hmm...
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

You don't have to find the password that generated the hash only a collision. The salt greatly thwarts this as the original password is merely a component in the message being digested.

In a world where each of the hashes weren't weakened, SHA256 would still be stronger because of the length, permutations, required to cause a collision. Mathematically speaking, there are an infinite number of collisions for any given hash, however due to the limitations of memory and input limitations, the number of collisions is fairly small. The shorter a hash result, the more collisions are possible given the input limitations.

The birthday paradox comes from looking at smaller population samples then where the general statistics would take over. This is similar to the length of the hash result. I believe that's why the comparison has been made before.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

You don't have to find the password that generated the hash only a collision. The salt greatly thwarts this as the original password is merely a component in the message being digested.
Right. I think I get that. So we'll expand our definition of a successful exploit of a password field to include another password which you can send through to get the exact same result. And will assume that with a salt this is an order of a magnitude more difficult. Then...
In a world where each of the hashes weren't weakened, SHA256 would still be stronger because of the length, permutations, required to cause a collision. Mathematically speaking, there are an infinite number of collisions for any given hash, however due to the limitations of memory and input limitations, the number of collisions is fairly small. The shorter a hash result, the more collisions are possible given the input limitations
Right.

Technically speaking, one must remember we are talking about passwords, not gigabyte large ISOs. For a 128-bit hashing algorithm, we need to consider passwords longer than sixteen bytes (and that's not considering that fact that most passwords only use 7-bits of a byte) in order to fill up the hash space. Realistically speaking, there's nothing stopping the user from using a 128 byte password (for which collisions have been found), and we can always make the salt as large as we want.

So... the trouble is finding a collision of this nature, where you have H(m) and seek to find m' where H(m) = H(m'), is a problem that, on average, will require 2^127 iterations, which is an extremely large number and unlikely to be overcome by Moore's law any time soon. The other possibility is a preimage attack, which as I've stated before, has not yet been created for any of the allegedly less-secure functions. Even though the keyspace is smaller, it doesn't make any difference, like a comparison between the age of the universe and two billion times the age of the universe: both are not feasible.

This is, however, not my original question. Just some musing; is my math wrong?
The birthday paradox comes from looking at smaller population samples then where the general statistics would take over. This is similar to the length of the hash result. I believe that's why the comparison has been made before.
Yes, they're quite similar, which is why I was I that was the case too. But is the difference between "two values that are the same" and "one value that is the same as a value you've already picked out" enough to call it a faulty comparison?

Still confused.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Ambush Commander wrote:Technically speaking, one must remember we are talking about passwords, not gigabyte large ISOs. For a 128-bit hashing algorithm, we need to consider passwords longer than sixteen bytes (and that's not considering that fact that most passwords only use 7-bits of a byte) in order to fill up the hash space. Realistically speaking, there's nothing stopping the user from using a 128 byte password (for which collisions have been found), and we can always make the salt as large as we want.
I'm talking about everything.
Ambush Commander wrote:So... the trouble is finding a collision of this nature, where you have H(m) and seek to find m' where H(m) = H(m'), is a problem that, on average, will require 2^127 iterations, which is an extremely large number and unlikely to be overcome by Moore's law any time soon. The other possibility is a preimage attack, which as I've stated before, has not yet been created for any of the allegedly less-secure functions. Even though the keyspace is smaller, it doesn't make any difference, like a comparison between the age of the universe and two billion times the age of the universe: both are not feasible.
They are only infeasible because of our limited perspective.
Ambush Commander wrote:This is, however, not my original question. Just some musing; is my math wrong?
If you're referring to a 128 bit hash, the number of permutations required to have all results is 2^128-1. While that may seem like a significant number, it's not all that computationally expensive to reach with readily available processors. Considering in basic processors are starting to overtake Moore's law, and that it's quite simple to interconnect massive numbers of processors together to perform more complex computations, it's seems silly to think it's far fetched.

As an aside, this is similar to me to the idea of longevity behind IPv6, which is also 128 bit. While it seems large now, in a few short years we'll be using more connected devices then ever before at an amazingly fast, ever growing rate. I don't think it will last nearly as long as IPv4 has.
Ambush Commander wrote:Yes, they're quite similar, which is why I was I that was the case too. But is the difference between "two values that are the same" and "one value that is the same as a value you've already picked out" enough to call it a faulty comparison?
Think of it this way: both are equations of one unknown. These equations are fairly simple to break down and solve.

The real problem is the weaknesses that have been discovered. Instead of having to compute all the combinations in a brute force attempt, we now only have to compute a subset. While the subset is still many computations, it's a tiny fraction of the number required for complete brute forcing.
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

So...

* If both algorithms were perfect, SHA256 would be more secure
* However, SHA1 and MD5 have weaknesses that cut down the amount of trials needed to brute force a collision (hash(unknown) = hash(unknown))
* The amount of trials needed to crack them, however, still is large, but it's on the edge of current computing
* Moore's law means that computationally infeasible attacks today could become feasible tomorrow

...problems of the future, hmm.
As an aside, this is similar to me to the idea of longevity behind IPv6, which is also 128 bit. While it seems large now, in a few short years we'll be using more connected devices then ever before at an amazingly fast, ever growing rate. I don't think it will last nearly as long as IPv4 has.
Possibly, but 2^128 is nothing to sniff at, especially considering that now it would be constrained by (presumably) physical limitations, not just processor speed. If every nanogram of matter on and in Earth was allotted an IP address, we'd still have lots left over.
Think of it this way: both are equations of one unknown. These equations are fairly simple to break down and solve.
I'm still not seeing that.

Collision: H(unknown message 1) = H(unknown message 2) Find u_1 and u_2 where this equation holds. Obviously, there's more than one pair of answers
First preimage: known_hash = H(unknown message)
Second preimage: H(known message) = H(unknown message)

The collision equation has two unknowns?
The real problem is the weaknesses that have been discovered. Instead of having to compute all the combinations in a brute force attempt, we now only have to compute a subset. While the subset is still many computations, it's a tiny fraction of the number required for complete brute forcing.
Allegedly, no preimage attacks for any of the hash functions have been discovered yet. I'm not sure what they mean by this: does it mean no one has ever successfully constructed a preimage, or does it mean that no one has ever been able to lower the amount of necessary computations to get the preimage? If it is the latter, then the weaknesses we've seen are, directly speaking, irrelevant, although they will probably be built upon to generate better attacks.

Sorry, I'm not trying to be dogmatic, I'm just trying to wrap my head around this crypto thing.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

Ambush Commander wrote:So...

* If both algorithms were perfect, SHA256 would be more secure
* However, SHA1 and MD5 have weaknesses that cut down the amount of trials needed to brute force a collision (hash(unknown) = hash(unknown))
* The amount of trials needed to crack them, however, still is large, but it's on the edge of current computing
* Moore's law means that computationally infeasible attacks today could become feasible tomorrow

...problems of the future, hmm.
Indeed.
Ambush Commander wrote:Possibly, but 2^128 is nothing to sniff at, especially considering that now it would be constrained by (presumably) physical limitations, not just processor speed. If every nanogram of matter on and in Earth was allotted an IP address, we'd still have lots left over.
True, it is quite large for us to comprehend at this point.
Ambush Commander wrote:I'm still not seeing that.

Collision: H(unknown message 1) = H(unknown message 2) Find u_1 and u_2 where this equation holds. Obviously, there's more than one pair of answers
First preimage: known_hash = H(unknown message)
Second preimage: H(known message) = H(unknown message)

The collision equation has two unknowns?
While collisions in the true sense are two unknowns, the collisions I've been referring to are preimage based. I don't know if the research being performed is preimage based or strictly collision based, but from what I've read about them it would suggest more preimage slanted. Unfortunately, terminology varies so it could be almost anything. :?
User avatar
Ambush Commander
DevNet Master
Posts: 3698
Joined: Mon Oct 25, 2004 9:29 pm
Location: New Jersey, US

Post by Ambush Commander »

Ah. I think I see now. Thanks a lot!
While collisions in the true sense are two unknowns, the collisions I've been referring to are preimage based. I don't know if the research being performed is preimage based or strictly collision based, but from what I've read about them it would suggest more preimage slanted. Unfortunately, terminology varies so it could be almost anything.
It appears that when you see "collision", it's almost always talking about collision attacks and not preimage attacks. This FAQ claims that the famed 2004 MD5 attacks where not preimage attacks, although I don't know enough about cryptography to go to the source and verify this myself.
User avatar
feyd
Neighborhood Spidermoddy
Posts: 31559
Joined: Mon Mar 29, 2004 3:24 pm
Location: Bothell, Washington, USA

Post by feyd »

For giggles I had PHP calculate the 2^128-1 (340,282,366,920,938,463,463,374,607,431,768,211,455), 2^256-1 (115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,935) and 2^512-1 (13,407,807,929,942,597,099,574,024,998,205,846,127,479,365,820,592,393,377,723,561,443,721,764,030,073,546,976,801,874,298,166,903,427,690,031,858,186,486,050,853,753,882,811,946,569,946,433,649,006,084,095).

They're quite large. 8O

;)

Sorry, this will break the forum layout for some users. :)
Post Reply