2017-10-19

Client Explanations: Password Storage

If you’re allowing your user’s to log in to your website, its imperative that you protect their passwords with an appropriate hashing scheme.

Hashing… whats that? We encrypt our passwords, isn’t that enough?

Hashing and encryption are conceptually similar, but the main difference thats relevant for this discussion is that hashing is a one way transformation. The password comes into the hashing process, and out comes a sequence of hexadecimal characters (0 through 9, and A through F).

Its then impossible to take that hex sequence and work backwards to the original password.

For example,

1	"password" -> [ SHA256 ] -> "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"

Here, SHA256 is the name of a good hashing technique.

Now that you’ve converted the password to 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8, you store that in your database.

But wait, if we store that random stuff, how to we log the user in later?

When the user logs in later, you run the same hashing process and compare the outputs.

When “password” is processed by the SHA256 process, it will always generate 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8. So all you need to do is run the hash again, and compare the outputs. If they match, then the user knew the correct password.

Why can’t I use encryption?

The problem is that encryption is reversible: if you know the encrypted data and the encryption key, you can work backwards do the password.

Why is that a problem?

Good password storage is designed to protect against the scenario where an attacker gains access to your user database. If they’re able to work out the passwords, they can then attack your users directly.

How then does hashing protect us better in this scenario than encryption?

If the attacker has the technical where-with-all to acquire your user databsae, you have to also assume they got access to the encryption key. So if you are only encrypting passwords, its trivial for the attacker to decrypt all the passwords for your users.

If you’re properly hashing passwords, the only thing the attacker can do is start guessing passwords, running them through the hash process, and seeing if they match.

We have logins on our site, but we’re not a bank or anything. We’re not really protecting any secret data. Does this really matter?

That may be, but users often re-use their passwords. So the password they use to login to your support portal is also the password they use for their email account, their social media, and/or their banks. You don’t want to be responsible for a user getting their identity stolen because an attacker gained access through your website.

How could they get our user database?

Most often its through a technical flaw like SQL Injection or unprotected backups.

It could also be stolen by a current or former employee, or an employee could be tricked into providing access through a social engineering attack.

Once they steal the database, they can start working at guessing passwords. Your “user is locked out after three failed attempts” feature is now irrelevant, because the attacker is going straight at a local copy of your database and not running through your application code.

But that will take a long time, right?

Ideally.

However, when users choose common or short passwords, it’s much easier for the attacker to guess. But thats a conversation for another time.

But this brings up why its important to choose a good hashing process. Some hashing processes are fast, and allow an attacker to guess thousands of passwords per second. With specialized hardware they can guess even more.

But other password hashing techniques are specifically designed to be slow, and/or difficult to run on that specialized hardware. PBKDF2 and BCrypt are commonly used because they offer a variable “work factor” that can configure how long the hashing process takes.

With these, it can take hundreds of years of computing time to break a sufficiently long password.

OK, that sounds great. But when I store the passwords encrypted, I can work out what it is and tell the user when they forget. What do I do now if I can’t get the original password?

You’re right that can’t do this with hashing. And you shouldn’t be emailing passwords around anyway.

Instead you’ll have to email the user a link with a reset token.

This also means you’ll want to validate the user has control over the email address they provide during signup. An email verification needs to be part of the signup flow.