(Okay, the actual title is not this, but it's too long for the input field
Questions about this have been asked too many times, and I've always felt unsatisfied by the popular answers, but lacked the time to clearly explain the rationale behind using "salt and pepper". This tries to amend the situation by going over the popular choices and discussing security problems with them. I've seen bad examples with the listed schemes "used in the wild" (discussing them here might be a bit against the forum rules) and almost every time the system would have been secure had it used proper hashing and a trick or two from the "paranoia" section. I cannot stress enough how important those steps are in terms of increasing the security, and since they are quite painless to implement, I don't see a reason not to.
Comments, thoughts, fact and grammar nitpicks are very much welcomed.
I plan to add links explaining some of the concepts mentioned and also to articles and real projects advocating this or that technique.
Maybe a chart crossing attack and defense techniques will be useful too, what do you think?
Do you think this needs expanding (or shrinking) in some direction?
Risk mitigation strategies for storing login credentials
When storing sensitive data it is never enough to rely on the security of the storage environment alone. While developing and deploying of 100% secure software is a nice dream to have, any security-aware developer should have contingency plans for when the security is compromised. In this article we will examine some well-known solutions and will propose some additional steps for hardening the security of stored login credentials.
1. Secrets and security layers.
Passwords (or rather username/password pairs) as means of authentication are the proverbial "something you know" in Information Security. The level of security they provide is considered enough for most of the current web applications. Passwords are essentially a shared secret between the user and the web application. The problem with "something you know" is that one wouldn't want someone else to know it as well, but on the other hand the user MUST send his password to the server, and the server MUST somehow store it so it can check later if Bob-the-user provided the right password. A good mechanism of secure transport of login credentials is the https protocol, which ensures that only Bob and the server ever see the plaintext of the password.
We must note that there are good passwords and bad passwords, and that the security of a login is as much in danger from an SQL injection vulnerability as well as from a weak password. We will not discuss password strength policies, but it must be noted that enforcing some at least basic password rules is an essential step in securing the login system.
For brevity we will talk about usernames and passwords, but do not forget that there might be more components to an authentication system, for example the "forgotten password" secret question/answer combo, password reminder texts, etc. Whatever security measures you take against disclosure of username/password combos, you must take against such "peripheral" data as well.
Applications in general work in layers, and web applications have the additional strata of web and database servers, underlying OSes with their specific file systems, etc. Each of these layers bears security burdens on its own and may prove to be the security point of failure in case of a malicious attack. This does not mean that the other layers must not try to reduce the impact of a security breach.
If an attacker gains shell access to the server he has quite a huge level of control over the system, and there is very little to be done to mitigate the damage. On the bright side though, usually attacks penetrate the much "lighter" layers of defense, and the damage they cause are limited.
In this article we will generally assume the worst, Mallory, a skilled malicious attacker, has access to one or all login credential records in the database, and possibly to other resources on the system. We will also assume that the most precious information on the server are the said login credentials, and that our primary objective is to protect them. We will now assess the levels of security breaching Mallory is capable of and how to further limit his options.
We must not forget that even in the mythical 100% secure system, which Mallory cannot exploit, Adam the admin also has full access to the database, and Bob the user would very much like that his secret will not be shared with Adam.
2. Plain text passwords
Also known as the "quadruple XOR encryption", this is the zero point in secure storage, and should generally never be done. Apart from Mallory, the passwords are visible to Adam the admin and Isabella the ISP. Now Bob the user is well aware that his private messages can be read by Adam and Isabella, but he totally wouldn't like that they also know his most precious password. The problem with Adam and Isabella also lies in the choice of a secure password transport mechanism, which is outside the scope of this article, so we won't discuss it futher.
3. Encrypted passwords
Encrypted passwords have the benefit of not being obvious to the naked eye, but also the intrinsic weakness that encryption is reversible. If not Mallory, then at least Adam can easily learn the plaintext, as he knows the encryption key. Moreover, since the attacker may have control over the plaintext of a password by either registering his own account or by guessing or online bruteforcing of the password of a single existing account, he may use a known plaintext attack to reveal the key that is being used.
4. Hashed passwords
Hashes are one-way functions. Given a plaintext password they produce a hash value that cannot computationally be reversed into the original password. By storing the hashes in the database, we can then check Bob's credentials by hashing his supplied password and comparing it to the hash of his real password we keep in the database. Since no plain text password is ever recorded in the database, this seems like a better solution than keeping the plaintext. And so it is, but still this solution has some unpleasant properties.
First, if Bob and Cindy both choose 123456 as their password, Adam the admin can easily see that their passwords match, because the hashes in the database are the same.
Moreover, if Mallory has Bob's hash, he can check it against a precomputed rainbow table for the hash function which will most probably easily give him the plaintext, Bob's password.
Third, if Mallory manages to obtain the entire table with login credentials, he can efficiently reveal the passwords of many users by simultaneously checking them in a rainbow table.
Apart from a rainbow table, Mallory can also try a dictionary attack and of course the plain oldschool bruteforce. Note that these attacks will be carried offline, at the leisure of Mallory's distributed password cracking botnet.
This is the currently most popular method, and only recently do existing open source projects try to migrate to salted passwords.
5. Salted hashed passwords
This is the method most authors advice for, and the rationale behind it is quite sound.
The reason bruteforce works (and rainbow tables and dictionary attacks are just optimisations of the bruteforce attack) is because users tend to choose bad passwords, ones which are short and found in a common dictionary. A good way to circumvent this, apart from enforcing fashist password policies, is to add a piece of "good" data to the password before hashing it. By "good" we mean that it is long enough and built of diverse character sets. This is like magically making the password "better" without actually making the user remember 30 characters of line noise. This technique is known as "salting the hash".
This is the point where authors stop agreeing and start moving in three main directions:
I. The school of doublehashing
The reasoning is that there are precomputed dictionaries for HASH(password), but not ones for HASH(HASH(password)), so here's an easy way to defeat password stealers. While this is true, the method still has the unpleasant property that Bob's and Cindy's doubly hashed passwords are still the same.
Apart from that, from a cryptological point of view, double hashing is considered insecure, although I suspect that in reality this would not impact anyone but an enemy of the NSA or something. I am not a cryptologist though, so I listen to what the experts are saying, which is not to do it. You may use this mantra in your daily meditations: Never never ever ever double double hash.
II. The school of constant salts
They propose keeping HASH(const_salt + password), where the salt is the same for every user. The hashes are again safe from rainbow table lookups, but the "same password" = "same hash" problem with Bob and Cindy persists.
III. The school of personal salts
If we generate a salt value for every user, HASH(user_salt + password) will be different for Bob and Cindy even if their passwords match. Lookups in rainbow tables will not work, so this seems the best strategy. We must keep the generated user_salt in the login table though, while with the first two methods no additional info was needed.
We must note that mixed strategies like HASH(const_salt + HASH(password)) and HASH(user_salt + HASH(password)), apart from maybe going going against against the aforementioned mantra, are essentially not different from schools II and III.
All of these strategies prevent precomputed attacks, but are still vulnerable to offline dictionary and bruteforce attacks, depending on the level of access Mallory has to the system. Assuming the exact hashing scheme is known (as is the case with the popular open source applications), Mallory has a range of options:
For example with II, Mallory may expect that Adam the admin was lazy and didn't change the default salt while configuring the application. He may also carry a known plaintext attack in order to bruteforce the salt value. Thirdly, if he manages to get read access to the server's filesystem, he can examine the application sources and reveal the salt value.
With III, if Mallory has access to the credentials in the database, he can also read the stored user_salt and carry a dictionary or a bruteforce attack against the account.
6. The advanced gourmet course: using salt and pepper
Since both schools of personal and constant salts have different strengths and weaknesses, it is only logical to combine them in hope that the result will be better. Let's examine a system that uses something like HASH(const_salt + password + user_salt):
1. The double-salted (or salted and peppered if you so prefer) hash is safe from rainbow table lookups.
2. Bob's and Cindy's hashes will be different, even if their passwords are the same.
3. Even if Mallory has full read access to the database, he will not be able to launch a dictionary or a bruteforce attack against the hashes, as he will be missing the const_salt, which is not kept in the database.
4. Even with all salts in hand, Mallory can bruteforce only one account at a time.
Mallory's only options now are to try a know plaintext attack agains the const_salt (which is easily thwarted by choosing a long and "ugly" salt) or by obtaining a read access to the sources.
With a sufficiently "good" const_salt, we may skip the special user_salt column, and use an already existing column as the username or the email column without affecting the security. Still, database space is free, so we might as well use it.
To put things in perspective, by using salt and pepper, the credentials can be stolen only if Mallory has read access to the database AND the filesystem AND bruteforce power (requiring N (number of users) times more processing power for dictionary or bruteforce attacks against all accounts)
7. 123456 Paranoia Lane, Gotham city
There are a few easy steps we may take in order to protect the login credentials better, and they lay in hardening the "nearby" security layers. We will not reiterate the well known recommendations for secure coding (validate and escape everything that comes from the user, simple, eh
and look for some additional non-kosher techniques. Remember, the best way with security is to be as paranoid as possible!
1. Choose good long salt (and pepper!) values. This is quite obvious, but still there are authors and coders out there that put artificial limits like CHAR( 8 ) columns for user salts, containing only alphanumeric characters. Database space is free, don't economize the cheap forsaking the expensive.
2. Use better hash functions. Although the recent "breaks" of MD5 and SHA1 do not seem to affect their usage as password hashes, better attacks may appear any moment now, so use another one for your new applications. SHA256 is the currently suggested hashing algorithm.
3. Use a "quirky" hashing technique - hash the first two chars of the user_salt and concat this hash to the whole salt'n'pepa'n'password before hashing it again with ANOTHER hash function - you get the idea. Be creative, this will require Mallory to have read access to the hashing code in order to see the exact hashing scheme, and then write custom password cracking code instead of using an already existing one.
4. "Don't do security through obscurity" is actually a misquote. Don't do security ONLY through obscurity. Use obscurity as another security layer when appropriate. Give "ugly" names to the columns and tables used in the login process. Even if Mallory finds an SQL injection vulnerability, he must yet guess those ugly names or at least use many additional steps in learning them if the database system has means of querying for table metadata. If possible, deny your database user access to the metadata tables.
5. Usernames are one half of the username/password combo required for login. I haven't yet seen a security system that readily displays a user's password, but many of those I've seen happily show usernames to the public in forum threads, private messages, member profiles, etc. Do you know which is the most popular password? No, not 123456, it comes a distant second. The most popular choice is to have your username as a password. Of course you need to display some id of the user to the other users of the system, so use a separate "Display name" column, and either strongly warn or enforce the user to choose a display name different from his username.