Page 2 of 2

Posted: Thu Dec 08, 2005 8:34 am
by onion2k
If someone wanted to hack one of my sites, and they had $250k for the purpose, then a bribe would work far better than buying a supercomputer.

Posted: Thu Dec 08, 2005 8:56 am
by n00b Saibot
onion2k wrote:If someone wanted to hack one of my sites, and they had $250k for the purpose, then a bribe would work far better than buying a supercomputer.
umm... How much will you take :twisted:

Posted: Thu Dec 08, 2005 9:02 am
by Roja
shiflett wrote:
Roja wrote:Security researchers were able to create a specific collision, on demand, in a short period of time.
You seem to be implying that preimage attacks are possible. Do you have any proof?

Xiaoyun Wang's discoveries are a big deal, but let's not get carried away.
This is a little tricky.

A true preimage attack would allow *arbitrary* data to produce a collision. The work by Lenstra and Weger makes clear that (as of the papers publish date) there isn't a true preimage attack method known yet.

However, that paper shows a "poor mans" preimage - with a known input and output, using the known behaviors of the Merkle-Damgard construction, you can massage a new input to match the output of the original.

Or put another way, you can create a different message with the same hash result - but it isn't truly arbitrary, you have to work backwards to get it.

Then the question becomes one of effectiveness. An attacker can do the same for any hash function - even a function that has no known weaknesses. The whole goal of hash functions is to make the cost of doing that extremely high.

However, Daum and Lucks presented at Eurocrypt 2005 that their findings put that cost as low as a few hours on a normal PC. Thats certainly within the realm of concern by any reasonable security researcher.

Daum and Lucks have shown meaningful postscript and exe collisions. Wang, Weger, and Lenstra have shown meaningful X.509 collisions. Mickle has shown practical attacks against signatures. The list goes on.

Just because they haven't found a full preimage attack doesn't change the fact that the hash is compromised in a substantial way - and in a way that is specifically important to the original poster.
shiflett wrote:MD5 isn't a bad word.
On the contrary, my interpretation of the findings suggest it is. Encouraging a programmer to use md5 as a security control - when we know it can be spoofed in mere hours on a standard PC - is definitely bad practice.

In my opinion, MD5 is a bad word.
shiflett wrote:(PHP's session mechanism uses MD5 to generate session identifiers. Is this insecure?)
I'd say it is, yes. In the case of session identifiers, the risk is somewhat lower, because the time it takes to produce a duplicate is longer than the session is likely to be useful for. In addition, the session most likely won't have nearly as much value as a specific protected/monitored file (as the original poster is asking about).

But would it be better to migrate to a stronger mechanism? Absolutely. Related is the fact that the php internals team recently discussed a robust hash and hash_file improvement for PHP6, which adds native (read: no module needed) encryption methods far beyond the current md5/sha1 choices. Clearly, there is a need, and thankfully, professionals like Ilia Alshanetsky are championing better encryption on the PHP internals team.

Posted: Thu Dec 08, 2005 9:47 am
by shiflett
Roja wrote:However, that paper shows a "poor mans" preimage - with a known input and output, using the known behaviors of the Merkle-Damgard construction, you can massage a new input to match the output of the original.

Or put another way, you can create a different message with the same hash result - but it isn't truly arbitrary, you have to work backwards to get it.
Right, so it doesn't apply here. If an attacker has access to the original file as well as its hash, then the game's over, regardless of algorithm.

These findings are substantial (and impressive), and we're likely to see preimage attacks emerge eventually, but we're not there yet. I appreciate the need for forward-thinking, but it's also important to consider how weaknesses in MD5 affect each use.

For example, let's assume you compromise a database and find the following password for the chris account:

d6253b274f8631111574245eab840a9e

Let's also assume that using only this information, you're able to come up with a string (small enough to pass as a password) that generates the same MD5. Will submitting this string as the password necessarily grant you access to the chris account? Imagine that the MD5 is generated as follows:

Code: Select all

$salt = 'SHIFLETT';
$hash = md5($_POST['password'] . $salt);
That's a pretty slight modification, but it still renders the attack useless. Stronger techniques would be even harder to break:

Code: Select all

$salt = 'SHIFLETT';
$hash = md5($salt . md5($_POST['password'] . $salt));
Roja wrote:I'd say it is, yes. In the case of session identifiers, the risk is somewhat lower, because the time it takes to produce a duplicate is longer than the session is likely to be useful for.
Let's assume that a preimage attack on MD5 is successful within 5 seconds. How would this weaken PHP's session mechanism?

(I'm trying to challenge you to look at not only the significance of recent research, but also the practical implications of it.)
Roja wrote:Related is the fact that the php internals team recently discussed a robust hash and hash_file improvement for PHP6, which adds native (read: no module needed) encryption methods far beyond the current md5/sha1 choices. Clearly, there is a need, and thankfully, professionals like Ilia Alshanetsky are championing better encryption on the PHP internals team.
Yes, there is a need, because of the problems you've brought up. The reason for this need is that PHP can't predict why you need to hash something. Is it to normalize the format of random data, verify file integrity, or hash a password? For some uses, MD5 is no longer a good choice. This doesn't apply to every use, however, which is why I mentioned PHP's session identifier generation.

Hope that helps.

Posted: Thu Dec 08, 2005 11:28 am
by shiznatix
shiflett - when you md5 a md5 the chances of collision are even greater.

Roja please, if it is really that easy to spoof md5 then please prove it to me. You seam to have the know how, setup a test site and hack it with all these holes in the md5 hash. I wan't to see it done by someone who is bashing it so much.

md5 is not a "bad word" it does its job and could be useful for shiflett's current situation.

Posted: Thu Dec 08, 2005 12:08 pm
by Roja
shiflett wrote:Right, so it doesn't apply here.
No, it does.
shiflett wrote:If an attacker has access to the original file as well as its hash, then the game's over, regardless of algorithm.
Not at all.

On a linux box, with a program like Tripwire, which checks checksums (which is what the OP wants), the game is definitely not over. The attacker has the known input - lets say the bash binary. The attacker can get access to *view* (not change) the hash - the tripwire store is on a CD.

If the hash algorithm isn't compromised, its still safe. Because the attacker cannot create a collision - a second input with the same output as the original (known input), the knowledge of both does not help.

However, because collisions are now possible on MD5, its possible to produce a second input with the same output. Thats the danger, and it absolutely applies here.
shiflett wrote:These findings are substantial (and impressive), and we're likely to see preimage attacks emerge eventually, but we're not there yet. I appreciate the need for forward-thinking, but it's also important to consider how weaknesses in MD5 affect each use.
Its not forward thinking. Its the exact issue that was discussed in the paper - generating a second (hostile) input with the same hash as a known good (non-hostile) input. Thats a collision.
shiflett wrote:Let's also assume that using only this information, you're able to come up with a string (small enough to pass as a password) that generates the same MD5. Will submitting this string as the password necessarily grant you access to the chris account?
If all you are checking is the checksum, and you aren't using a salt, *yes*. In the case of checking file integrity, you store the checksum - not the file contents. So you check a known good md5 (checksum) against the checksum of a questionable file. If they match, you've been fooled. Thats what the papers show is already possible in a matter of hours.
shiflett wrote: Imagine that the MD5 is generated as follows:

Code: Select all

$salt = 'SHIFLETT';
$hash = md5($_POST['password'] . $salt);
That's a pretty slight modification, but it still renders the attack useless. Stronger techniques would be even harder to break:
Notice that now you are on a different topic. Now, instead of checking a checksum to ensure a files contents are intact, you are checking the checksum of a password to validate identity. Two different processes. Further, you've used salt to ensure the checksum varies - something most file integrity test suites (like tripwire) do not do.

shiflett wrote:
Roja wrote:I'd say it is, yes. In the case of session identifiers, the risk is somewhat lower, because the time it takes to produce a duplicate is longer than the session is likely to be useful for.
Let's assume that a preimage attack on MD5 is successful within 5 seconds. How would this weaken PHP's session mechanism?
In the case of session identifiers, a preimage attack makes no subtantial difference. A preimage attack only helps if you want to change *the content*. Session identifiers don't hash a content, they are simply a unique identifier.

The danger is in a collision. If I can generate a collision - a duplicate session_id - in a short period of time (5 seconds works), then I can become another user. Now, in a well-written application, you'd hope for multiple layers of defense, including further authentication on priv change, and the like. However, on the surface, ignoring other lines of defense that reduce the risk, the risk is that a duplicate session id could allow me to hijack a session.

Preimage attacks are a red-herring here.
shiflett wrote:(I'm trying to challenge you to look at not only the significance of recent research, but also the practical implications of it.)
One, I'm not here to take a test. You challenged my statement, and I defended it.

But beyond that, I've explained why it is specifically applicable in this case - thats as practical as it gets. There is a practical risk in the specific scenario the original poster asked about, and I brought it up. You then argued that it was not in fact a risk in that scenario - by bringing up a different scenario - which you then added additional protections (salt) to mitigate the risk. Just because there are workable and practical solutions to the problem in other situations, does not change the fact that there IS a problem in the specific scenario the OP asked about.
shiflett wrote:Yes, there is a need, because of the problems you've brought up. The reason for this need is that PHP can't predict why you need to hash something. Is it to normalize the format of random data, verify file integrity, or hash a password? For some uses, MD5 is no longer a good choice. This doesn't apply to every use, however, which is why I mentioned PHP's session identifier generation.
This much we do agree on: There are some places where the damage from the attacks against MD5 won't be substantial.

However, I would argue that its bad practice to suggest its use even in those cases. We know there are substantial weaknesses in the hashing algorithm. We know that there are collisions. Since the primary purpose of a hash function is to have unique values generated from an input (without collisions), using a hashing function that HAS collisions is using the wrong tool for the job.

Better, the alternatives are extremely competitive in terms of speed and overhead. While Feyd's sha256 script for php is obviously slower than the native md5 function, the native version is extremely similar in terms of overhead. So given the choice between an algorithm known to be broken in its primary purpose, and an alternative that we have a strong confidence in, we only have to consider impact, and with impact low, there is little reason to support it.

While there have been proven collisions in md5 and sha-1, there have not (to my knowledge) been proven breaks in sha256.

So, lets turn this around a bit. Why, as a security researcher, are you advocating the use of a broken algorithm in the specific scenario where it has been shown to be weak, when there are suitable alternatives?
shiznatix wrote:Roja please, if it is really that easy to spoof md5 then please prove it to me. You seam to have the know how, setup a test site and hack it with all these holes in the md5 hash. I wan't to see it done by someone who is bashing it so much.
I'm not bashing it. I'm sharing relevant facts from researchers who have proven it. You want proof, then read the papers and the links I included in the last post. They show three different examples of actual documents that YOU can run md5 against, and see the same hash, with different inputs.

Thats proof, and thats exactly what the original poster was trying to verify did not happen. See the problem with that?

Posted: Fri Dec 09, 2005 12:16 am
by AGISB
Here is the problem. I think that a file check is just usefull for accidental overwrite or something similar done by the site staff. If an intruder can modify the contents of an include folder outside the directory tree you got way bigger problems than a possible md5 collition.

Therefore MD5 is very ok for this task as it is fast and secure enough for the task at hand.

Posted: Fri Dec 09, 2005 4:21 am
by acidHL
Wow, didn't expext to start such a heated debate! 8O

The page that contains the original hash for comparision would either be held in a database or on a page encoded by zend or ioncube.

I know theres no such thing as 100% secure but if I can make it harder at minimal performance impact then all the better.

Posted: Fri Dec 09, 2005 4:50 am
by shiznatix
what I would do:

md5 the original file. Store that hash in a database

when it goes to be included, md5 it first and check the hash against the one in the database. Act on that information however you like.

If you are making a website for the CIA or somtin, maybe look into other security ideas. But if your just making a website for yourself or a client, you will be peachy keen.

Posted: Fri Dec 09, 2005 10:07 am
by Maugrim_The_Reaper
/me sides with Roja.

But I get what Chris is stating also. I just think its time to move forward from MD5 and consider advocating its eventual replacement (gotta happen some day afterall...).

Posted: Fri Dec 09, 2005 11:07 am
by Chris Corbyn
As far as the OP is concerned here if you're worried about collision don't compare a hash. Either encode it (base64encode() ?) or just compare the raw contents.... I know you're then dealing with more data but at least you now it's the correct data ;)

In terms of the collision discussion... /me keeps out :P

Posted: Fri Dec 09, 2005 12:17 pm
by Roja
A friend helped me to realize that I was too busy advocating, and not doing my (usually) good job of being pragmatic. So..

There are a number of solutions to the problem. In order of security:

1. Do nothing, hope for the best
2. Check the filesize/timestamp (but not the contents)
3. Check a checksum of the contents
4. Check the full contents
5. Check the full contents, stored in a readonly source (cdrom, etc)
6. Store the readonly source offline, disconnected, at a seperate location, complete with checksums for faster verification

We really got 'caught up' in #3, mostly because its the common solution, and the most balanced. It balances the need to reduce risk (security) with lower overhead in storage and processing.

Unfortunately, #3 is admittedly "under discussion" in the security world. In the last two years, some dramatic results have cast doubt on "how secure" some of the checksum choices are. For Information Security professionals, its incredibly interesting to discuss the details of it - much like the specific RBI for a baseball player is interesting to sports fans. That doesn't change the fact that there are other options, and even #3 has multiple "pretty secure" choices. (SHA-256, etc).

Its just rare that I.S. practitioners get to discuss such mundane things with an educated audience.

So, yes, there are checksums that you can use, and there are alternatives, all with different levels of risk/impact. Your mileage may vary, choose wisely.

Posted: Sat Dec 10, 2005 6:57 am
by acidHL
Thanks for the input guys.
Even if it did get a bit... heated at times :D

I shall discuss some of your suggestions with the other programmer on this project.

Posted: Tue Dec 13, 2005 2:44 pm
by shiflett
I don't think this debate is at all heated, but it is a bit interesting because of the lack of clarity surrounding this particular topic (due to recent and ongoing discoveries).

(Note: It sucks that I just had to log in without SSL protection in order to reply, because I'm at ApacheCon, and there are probably a few dozen people who just saw my password.)
Roja wrote:On a linux box, with a program like Tripwire, which checks checksums (which is what the OP wants), the game is definitely not over. The attacker has the known input - lets say the bash binary. The attacker can get access to *view* (not change) the hash - the tripwire store is on a CD.
External storage of the tripwire database is considered a best practice. If you're interested and want to learn more, I think the Linux Security Cookbook details how this approach changes your typical workflow.
Roja wrote:If the hash algorithm isn't compromised, its still safe.
I think this depends upon how you define safe. In fact, you and I are each arguing both sides here - in the overall discussion at hand, I'm suggesting that not knowing the exact method used to create the MD5 hash presents a substantial obstacle to an attacker. You're noting that here. :-)
Roja wrote:Its not forward thinking. Its the exact issue that was discussed in the paper - generating a second (hostile) input with the same hash as a known good (non-hostile) input. Thats a collision.
The paper you're referring to does not describe a preimage attack, and you again seem to be suggesting that it does.

The pragmatic reason for moving away from MD5 (at least using it plainly, without the use of a salt or any similar technique), is that a preimage attack is imminent. It is not because a preimage attack is possible now. That's why I mentioned forward-thinking. I did not suggest that forward-thinking is bad.
Roja wrote:If all you are checking is the checksum, and you aren't using a salt, *yes*.
Now you're having to qualify your statement, so I think my point has been made. :-)
Roja wrote:Notice that now you are on a different topic.
Nope, I just generated an MD5 hash.
Roja wrote:Now, instead of checking a checksum to ensure a files contents are intact, you are checking the checksum of a password to validate identity. Two different processes.
Can you clarify what you mean? It's the exact same process - comparing expected output (with known input) with actual output.
Roja wrote:The danger is in a collision. If I can generate a collision - a duplicate session_id - in a short period of time (5 seconds works), then I can become another user.
I unfairly painted you into a corner here, but you're wrong, and this was my original point. (I didn't expect for you to try to defend this - I just meant to make a point.)

You're glossing over something substantial and missing the point. If you can generate a collision, can you really generate a duplicate session identifier? Think about that a bit more, and I think you'll understand the flaw in your statement. Generating collisions doesn't help you determine the output - in fact, the output is what you're trying to duplicate, so that must be already known.
Roja wrote:One, I'm not here to take a test.
I'm just here to help people, including those who want to challenge me. :-)
Roja wrote:Why, as a security researcher, are you advocating the use of a broken algorithm in the specific scenario where it has been shown to be weak, when there are suitable alternatives?
Primarily because I'm being pragmatic (and taking the current problem into consideration, since this isn't a generic hashing discussion).

Until the new hash extension is included in PHP, PHP developers don't have very good alternatives (mhash is probably the best). Using PHP code written by someone else that claims to implement a more secure algorithm places a lot of trust in that implementation. It might be fine, but I feel pretty confident that the md5() function has no flaws other than ones that exist in the algorithm itself. A flawed implementation of something else can potentially be much worse, and I prefer to know my risks.