One of the most important responsibilities you have is keeping your systems secure from those that don’t have any right to them. This is typically secured through some sort of authentication model with the username/password combination reigning king. You have may heard about clear text passwords being stored and wondered what the fuss is all about. Let me clear that up for you in this piece.
Securing a system involves the user typing in the email/user-id, along with their password, which a server then checks before permitting access. To make this check, you have to keep a record of the user-id and the password. Makes sense.
The problem is that humans are horrendously lazy. When confronted with having to enter a password, most
will opt to use one they’ve used before, one that is easily remembered and where their fingers will just
tap over the keys without even thinking about it.
The issue you are now faced with, is that password you are storing on behalf of the user to access your systems, will most likely be able to also unlock other services, such as their online banking, shopping — basically their whole online life. Who else within your organization can see that data?
When you start to analyze your organizational data flow, you’ll find the number of eyes balls that can easily view this data is alarmingly high. From the support staff that is managing your production data, to the DevOps crew who are managing code, to the DBA who is maintaining the data, to the list of IT staff managing servers.
What you have effectively done, at the very least, is taken a piece of rather personal and sensitive
data from a user, and with a megaphone, shouted this to your company.
This is all before we look at external breach opportunities, through fair means or foul (SQL Injection attack, SQL Overrun, data leakage etc).
Pretty bad, eh?
The premise stated at the start, was that we had to store the user-id and the password so we can make
that determination of authenticity upon each login. What if we challenged that? What if we don’t store
the actual password? You can’t leak what you don’t have. This could reduce a lot of our data security
stress. Fortunately this is well trodden path and a very easy one to walk.
Instead of storing the actual password, you store the result of a mathematical formula that is performed on the password. This formula should be one where the chances of two different passwords producing the same answer, is infinitesimal. So remote we can rely on it, especially since we have the combination of the user-id to factor in to offset any collisions.
Let us look at a real simple example (poor) algorithm with the password “alan”.
We could assign a number to each letter, add up all the numbers, and store the result. So if a=1, b=2 etc, “alan” would result in 1+12+1+14 = 28. When someone presented the password “alan” we would run the formula again, compare that result with the one we stored at creation, and if it was 28, then we know the password presented was the right one. Storing 28 is more secure than “alan” because it isn’t at first obvious what sequence of characters produces 28.
These type of algorithms are commonly known as hashing or checksum algorithms. They are often used to
digitally sign a file to make sure the contents has not changed. Any small change in the file, even a
single character update, would produce a different checksum value.
You may have seen this in some websites that offer files for download. They will publish the hashcode so when you retrieve the file, you can run the calculation yourself, and if it matches then you can be confident the file you downloaded has not been manipulated mid-download.
These hashing algorithms are common and available in most mainstream languages and databases. A common
one that is used in password hashing is the SHA256 algorithm. This performs a calculation on the input
data, irrespective of the input size, always producing a 64 character result.
Given our previous example of “alan” producing 28, running this through a SHA256 algorithm produces the following result:
Now our original algorithm would still produce 28 for the password “nala” — which is why it is a very poor one. Yet look at the result of that running it through the SHA256 algorithm:
So this clearly is a lot more robust than our simple addition algorithm.
The key to the hashing algorithm is that it can never be reversed engineered. Given the hashcode, there is no way to mathematically reverse it to produce the original content. The only way to figure out what the original content is, is to perform a brute force attack, of where you keep guessing what the content is, perform the calculation and see if it matches. Huge effort.
How difficult is it to integrate to your code? Simple. For example, doing this in MySQL, is super trivial — you simply use the SHA2(.) function in your SQL statements:
SELECT SHA2('alan', 256)
Incorporating this into a SQL statement as part of your authentication check is no-effort at all (assuming you are using a database for user storage). It really isn’t any more complicated than that.
Now that you have seen how we can easily it is to create a hash, there is a technique for making the hash that little more secure and even harder to guess, and that is called salting>. This is where you add some additional piece of content prior to calculation that will make it unique to that user. For example, it is common to include the username as part of the password prior to calculation. That way, if two users happen to use the same password, the resulting hashcode will be different as their usernames differ.
So now that you know how easy it is to get out of the password storing business, how can you tell if a
website is storing your password in clear text? Well there are certain clues that will tell you if the
site is being responsible.
As part of a password reset process, if the website does any of the following then beware and change your password immediately:
Emails you with your original password in clear text for you to retype in
Asks you to provide given characters of your original password (for example: what is the 3rd and 5th character of your password?)
You can also guess if they are using a salt. If you change your email, or your username and they don’t ask you to repeat your password, then chances are they are not using a salt. That may also point towards them storing your password in a raw format.
As you have discovered there are very simple steps that can be taken to make your user management more
secure and here is the real exciting thing: It costs nothing. It saddens us that we are still finding
examples of readable passwords in databases, in our due-diligence engagements at MacLaurin
Storing passwords in clear-text is like smoking in restaurants — no longer socially acceptable as everyone knows how harmful it is to you and those around you.
There is no excuse for not taking that next small logical coding step to remove a whole world of hurt.