Why You Shouldn't be using SHA1 or MD5 to Store Passwords

There are a lot of badly coded sites out there, and far too many sites still seem to be falling prey to SQL Injection vulnerabilities resulting in a lot of high profile leaks of user data.

I wrote quite some time ago on The Importance of Salting Stored Passwords And How To Do So Correctly, but whilst the underlying message remains correct, the techniques for doing so have been outpaced by technology.

Although still widely used, checksum algorithms such as SHA1 and MD5 are no longer sufficiently secure.

In this post we'll be exploring why you shouldn't be using MD5/SHA1 and how you should be storing passwords.

 

Should Passwords be Encrypted?

This is largely semantics, but one that needs addressing given that the mainstream press quite often get it wrong.

Passwords should never, ever, be stored encrypted. Encrypting something implies that it can be decrypted; if the key is discovered your stored password can be reverted back to plain text.

The password should be salted and then hashed. Hashing is a one-way process, the only way to 'recover' your password is to guess the password and then, using the same salt, run it through the process used to generate the original hash.

It may seem a subtle difference, but at the technical level, the challenges involved in brute-forcing a hash differ to that of decrypting ciphertext.

With encryption the key needs to be kept secret (but will still need to be stored within the app - how else are you going to verify submitted passwords?), with hashing the salt is considered public knowledge as the correct password is required to generate the correct hash.

Some also like to treat the salt as a secret, but this should never be the only defence, at best it's security by obscurity and at worst it can give a false sense of security.

 

Salted Hashes

To cover some well trodden ground, a salted password hash is essentially a cryptographic calculation of your password combined with an arbitrary string.

Before salting was used so widely, attackers used to generate Rainbow Tables. Rainbow tables are (at a very basic level) essentially a database of hashed passwords. You look the hash up in the database and if there's a match, you have the original password. They were computationally intensive to generate, but once created were incredibly quick to use.

Salting makes this far harder; because the generated hash is based on both your password and the salt, the attacker would need to generate Rainbow tables for your specific salt. The natural extension of this, of course, was to use a different salt per user so that the attacker would need to generate different Rainbow Tables for every user. Suddenly the effort wasn't worth it most of the time.

This, in fact, is the entire basis of good password storage. A good designer will work on the assumption that one day the password list will be compromised. So aim to make attacking the hashes so computationally intensive that most just won't bother.

You can never guarantee that passwords are irrecoverable, but you can make it so hard to do that it's just not worth the time and energy.

 

Things Have Changed

In the past, we relied on CPU's to perform the calculations needed to brute-force a hash. Nowadays, we can use much faster GPU's thanks to libraries such as CUDA and OpenCL.

Although some still do, It's still usually not worth generating Rainbow Tables, but the reasons for this have changed. It's un-necessary when we can perform the exact same calculations on the fly (and not have to find the vast amount of space that Rainbow Tables typically occupy). The exception being when a system-wide salt is being used instead of per-user salting.

This dramatic increase in processing speed is exactly why checksum based hashes are no longer fit for the purpose of storing passwords. Historically, they were used because they were convenient, quick to generate and it was reasonably expensive to try and attack them. Now, with more powerful servers, we can afford to use something that's more computationally expensive and as our attackers have the power to try millions of checksum calculations a second we need to.

The more work is needed to generate a hash, the more work an attacker needs to do to try and brute-force it. As we'll see later in this post, the time needed to brute-force different hashes can differ greatly.

 

Attack Methods

I've used the term brute-force a few times, but only in a loose sense. A true brute force would be to try every combination between 'a' and 'zzzzzzzzzzzzzzz' (including numbers and special characters).

It's an option, but most attackers don't bother with this except as a last resort. Most will begin with a simple dictionary attack, trying a list of known passwords first. Some of the password lists available on the net are absolutely huge, so there's a good chance of success, especially when checked against a large user base.

The next step is often to apply rules to the dictionary; from simple things like switching numbers for letters (so hello becomes h3ll0 or h3770) to appending characters, capitalizing the first letter and combining words. If there's a pattern for generating 'strong' passwords, someone has probably written a rule for it.

As a result, if you've concatenated two dictionary words (even if you'd never normally see them together) there's a very good chance your password will be discovered quickly. There are even predefined rule sets for common keyboard patterns, so passwords such as 'qazwsx123' will fall almost as easily as 'goodd0nut'.

 

The Storage Schema makes a huge difference

I decided to create some test hashes using different schemas, so that I could show the time difference involved in generating a guess. To aid this, my dictionary contains one word - the correct password, and the salt is known.

The hashes I used (as this is about computational speed, the password is password!)

MD5
-------
76c8245727d68a781e98288f147184b5:5b0f4bf8fec9b53879a5bb929172c153

SHA1
----------
fecbb8d3eb7c9af4fcf61a8722a52d84428ab519:7ac4cd7db7a9d9d7abb5c7ebb46aa9265c941f87


BCrypt
-----------
$2a$10$ppsVmqeIhoD.XjE1Zdmj4.fjH2HZblKGycmpgU3SA2UB4cbo4nPLG

 

Using oclHashcat-Plus I ran a check against each to see the hash rate. Unfortunately due to a driver error, I couldn't run a check against a Bcrypt hash, so used the weaker md5crypt instead.

 

Hashing Mechanism Tries per second
MD5 5,357,000
SHA1 3,673,000
md5Crypt 12114

 

As you can see, switching to the more computationally expensive md5crypt has a huge effect on the hashing rate. Something we'll see more prominently when testing CPU based hash cracking.

 

CPU Based Cracking

The numbers below are using the CPU to crack the exact same hashes as before, this time I was able to include BCrypt (seems to be a CUDA issue...).

 

Hashing Mechanism Time to generate
MD5 0.008s
SHA1 0.012s
BCrypt (cost 10) 0.012s

 

With a cost of 10, the BCrypt hash took about the same amount of time to crack as a SHA1 hash, but as soon as the cost was increased, the time needed began to increase dramatically.

 

BCrypt Cost Time to generate
11 0.208s
12 0.400s
15 3.144s
18 24.998s
20 1m 40s

 

Although processing with a GPU will always be faster, it's the relative increase in time that we're really interested in. By setting the cost to 11, it takes 17x longer to generate a single guess than with a SHA1 hash.

The popular 'rockyou' wordlist has 14344391 words in it, so when trying against MD5 hashes, based on the stats above, it'll take 31 hours for a CPU to go through the rockyou list (in reality, it's faster though). 

With a Bcrypt hash using a cost of 12 it'll take 66 days. A GPU will handle them much more quickly, but what you should take away from this is that you can massively increase an attackers workload and still take less than a second to hash passwords.

 

But Password is an obvious Password

The statistics above aren't really affected by how obvious the password is, they reflect how long it takes to generate each guess. How quickly the password will be cracked if it appears in a wordlist partially depends on where in the wordlist the credential is found. What password complexity does, is decreases the likelihood of guesses matching anything in the wordlist or those generated by the simpler rules.

As random as a password may appear, it's not impossible that someone else has thought of the exact same password. If they've been unlucky (or stupid) and that password has been compromised, that password may be lurking on a list somewhere. 

We're all prone to using our own systems, even if we don't realise it, and the mechanisms that you use to help make strong memorable passwords are precisely those which are likely to render your password susceptible to rule based attacks.

Good random passwords do improve your chances of falling into the 'not worth the effort' group, but the chances of that are raised exponentially if the password is hashed with a suitable mechanism. If it'll take me two days to generate all possible permutations of a password stored with MD5 I might consider it worth it. If it's going to take 70 days to crack the same password stored with BCrypt I may decide not to bother.

When an old GPU can generate millions of MD5 / SHA1 hashes a second, it's clear that those mechanisms are no longer suitable for password storage (if, in fact they ever were).

 

What sites should be doing

From the stats above, it should be reasonably clear that using Blowfish (BCrypt) is the best way forward. It's not quite as simple to implement in PHP as SHA1 and MD5 are, but it's far from difficult.

 

function bCrypt($pass,$cost)
{
      $chars='./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';

      // Build the beginning of the salt
      $salt=sprintf('$2a$%02d$',$cost);
      // Seed the random generator

      mt_srand();

      // Generate a random salt
      for($i=0;$i<22;$i++) $salt.=$chars[mt_srand(0,63)];

     // return the hash
    return crypt($pass,$salt);
}

// Usage
// Set the password in a variable
$pass = 'Password';
$hash = bCrypt($pass,12);

// We've now got the hash and can store it in a db, or in a file.

// To check the password
if ($hash == crypt($pass,$hash)){
echo "Logged in";
}else{
echo "access denied"
}

 

We don't even need to store the salt itself in the DB, we simply pass the correct hash as the salt and crypt does the rest.

 

Conclusion

Although I couldn't measure BCrypt attempts on my GPU, it should be clear that MD5 and SHA1 just don't make the cut when faced with modern hardware. An attacker doesn't even actually need to buy a graphics card, you can use a 4 GPU cluster for an hour on Amazon Web services for the princely sum of $2. It's also quite easy to distribute the cracking workload with tools like oclHashcat-plus, so multiple servers can work on the same hashset at the same time.

The hashing speeds I got from my NVIDIA GPU pale in comparison to those achievable with ATI graphics cards (ATI cards have lots of small shaders, perfect for this kind of work) so the harder each hash is to calculate the better as an attacker will probably be using ATI.

It's impossible to ensure that passwords can never be calculated based on hashes, but you can make it incredibly difficult for an attacker to do efficiently. As with anything, it's not about being 100% secure, but being more secure than your neighbour: Most attackers will just move onto the next SQL dump rather than wasting their time on your userbase.