Due to some complaints from the nice gentlemen over at Reddit, I have rewritten parts of the article to clarify that I do not in any way promote sloppy use of cryptography.
By "casual cryptography", I refer to the mindset of which this very basic introductory article is written. Cryptography is serious business, and should not be taken lightly when implemented. But a basic understanding of the concepts, if even so shallow, is valuable. By writing this article, I wish to awaken your interest in security, so if you find yourself intrigued, just continue learning.
Cryptography is for most web developers like a vast and endless ocean of intimidating un-absorb-able information. Which is sad, since cryptography is so useful and in many situations absolutely necessary. I will not even try to explain why, since I consider developers that don't care about the security, and integrity of their end users a lost case.
For the rest of you, who do care, read on.
To begin with, I will try to give you a hint of the more difficult aspects of cryptography. I feel this is important for the general understanding of the kind of things that are possible to do with the mathematics involved. But don't worry! It is just a brief overview.
To pass a secret message from a sender to a recipient, the message should be encrypted before it is sent, and decrypted in the other end. The encryption/decryption is done using "keys". A key is basically a few more or less random bytes.
In the most simple case, a single key is used for both encryption and decryption. This creates a problem, since the key has to be passed around to whomever might need to read the message. That creates a greater risk of the wrong persons to get access to the key.
By exploiting some weird mathematical properties of prime numbers, the concept of public/private key pairs was developed. The idea is to use one key for the encryption of a message, and another key for the decryption. Let's call them A and B. Those keys always comes in pairs, and are selected carefully to match each other. When a message is encrypted with key A, the message can only be decrypted with key B, and vice-versa.
As the name implies, a public/private key pair is divided into a public key and a private key. The private key is kept strictly secret by it's owner. The public key, on the other hand, is openly shared with anyone who might want it. When someone wants to send a secret message, they would therefore encrypt the message with the recipients public key. The message is then only accessible with the private key from the same pair, which means only the intended recipient can decrypt it.
Let's say someone is sending a message where it is very important that the receiver can be sure it comes from the expected sender. The sender can then "sign" the message by encrypting it with his private key. The recipient will then decrypt the message using the senders public key. If the decryption is successful, the recipient can be sure that the message was indeed encrypted with the senders private key, and therefore has not been altered after it was signed by the sender.
An interesting property of this kind of encryption is that a message can be encrypted with any number of keys and later decrypted, regardless of in which order the keys are applied. This means a message can be both signed with the senders private key, and encrypted with the recipients public key, or signed by several different persons, or encrypted with multiple keys, or encrypted and signed in any imaginable combination.
Public/private key pairs are implemented by SSL which is commonly used by the HTTPS protocol, and by PGP.
This article is intended for web developers, and as such we rarely have much use for public/private key pairs. But don't despair! There is another class of much simpler one-way encryption you can start using today.
The concept of digestion focuses on reducing any stream of data do a fixed size, unintelligible but deterministic "hash". There are a number of algorithms that do this. Their goal is to produce a hash that is as unpredictable as possible, with regards to the original input. In other words, given the hash, it should be as difficult as possible to find the original input.
With a perfect digestion algorithm there should be no other way to find the original input, than a simple brute force search, where you feed the digestion algorithm different input values until a matching hash is found. With long enough hashes, that could easily take thousands of years even on the worlds fastest supercomputers.
Two commonly used algorithms:
There are many other stronger digestion algorithms, so you might want to put some more research into this, to find out which ones are available for your current programming environment.
Just remember that even md5 is infinitely better than nothing.
The hash of a password (or any other data) is usually thought of as the fingerprint of that password. Any two distinct passwords should always give different hashes. This is of course impossible to guarantee, since there might be more possible passwords than there are possible hash values. However, finding two distinct passwords with the same hash is extremely unlikely.
To protect the integrity of your end users, it is wise to never store any passwords in plain text, but to digest them into a hash before they are stored. When the user enter his password, it is therefore not compared to a password stored in the database, but digested into a hash and compared to the stored hash. This means that if the hashes match, we can be sure the original passwords also match.
<?php if($EnteredPassword == $StoredPassword) LogIn(); ?>
<?php if(sha1($EnteredPassword) == $StoredHash) LogIn(); ?>
Since you no longer store any actual plain text passwords in you database, you can worry a bit less about what would happen to the passwords if your server gets hacked, or stolen or abducted by aliens. But are the passwords really safe now? Not really.
Let's assume the unthinkable happens and your database gets stolen. The thief don't have the actual passwords in there, so he can't simply read the username/password and log in. Let's also assume that you used some super efficient hashing algorithm forcing the thief to test potential passwords one by one until a matching hash is found.
What does this mean? If all users had chosen strong random passwords, they would be fine, but this is not a fairy tale, so they haven't. The passwords are full of "steve123", "password" and obscenities. Trust me, I've worked with passwords stored in plain text (my client made me do it), and I've seen it all. This means that the thief can take a dictionary of common words, digest it word-by-word into a dictionary of hashes and start comparing it to the stolen hashes. He will easily find lots of matches. He might even just do a Google search for the hash and find it listed together with the original password.
A password salt is a random string that is unique for each user. It is stored in plain text, but instead of the password hash, we now store the hash of the password + the salt.
<?php if(sha1($EnteredPassword . $StoredSalt) == $StoredHash) LogIn(); ?>
We can still do the same kind of dictionary attack if we again digest each word in the dictionary, but this time together with the stored salt. But a dictionary attack is now impractical, because the whole dictionary has to be digested individually for each single password + salt hash. With n users that is n times more work. The thief might be happy to wait for a few hours, but not 100 000 hours.
We cannot solve the problem completely, but we can easily cause the thief a lot of trouble, until it becomes impractical to use a dictionary. When blackmailing or social engineering becomes more feasible than actually cracking the password, you can feel reasonably safe.