Written by Roy van Rijn (royvanrijn.com) on Jan 1, 2011 17:17:25

Rainbow tables, reverse hash lookup

Today I’ve been looking into rainbow tables. These are tables used to do a reverse lookup for a hash function. For example MD5, or Windows LAN Manager. Usually these tables are used to find passwords if the hash is known. Now I’m not looking for a method to crack somebodies computer, but the technology and algorithms involved are very advanced and might be usefull in other fields as well!

Hashing

First off, lets talk about ‘hashing’, what is hashing? Well, a hash-function is a one-way function which turns some data (usually text) into a hashcode. For example… passwords:

"test"     -> "098f6bcd4621d373cade4e832627b4f6"
"password" -> "5f4dcc3b5aa765d61d8327deb882cf99"

Every good website which takes security seriously will only store these MD5 hashes in their databases, not the real passwords. So even if their database is compromised the attackers don’t have anything, because the hash-function is only one-way.

Attacking a hash

Where there is security there will be crackers trying to break it. How would you go about attacking, reversing, a hash? The earliest form was just to create huge tables of:

PLAIN_TEXT -> HASH

And then check if the hash is your hash. If you have a match, you’ll get into the system!

The problem is that these tables take a very long time to compute and you’ll end up with so much data you won’t be able to store it. This made hashing pretty safe.

Rainbow tables

A couple of generations further down the line, crackers now use rainbow tables. It takes the best of both worlds having a small(-ish) table on disk, and doing minimal computations. So how do they work..?

The reduce function

Lets assume we start with a random piece of text. For example we want to crack all passwords of length 5 and consisting of [ABC…XYZ0123456789]. We can now calculate the hash value. Instead of storing this single pair, we do a little trick. We use something that is called a ‘reduce function’. This is a selfmade one-way function that turns a hash back into a password! But not the original password (it isn’t a reverse hash-function) but just into some other password.

Why would we do that? Lets continue, we take our first random password, generate a hash, and then we reduce it back to another password. This is then again hashed, reduced, hashed, reduced for lets say 1000 times. We’ll end up with:

Hash:   "random"                           -> "7ddf32e17a6ac5ce04a8ecbf782ca509"
Reduce: "7ddf32e17a6ac5ce04a8ecbf782ca509" -> "ienw3"
Hash:   "ienw3"                            -> "b9322e367ad002d5adf7ca60b8b61e86"
        ... (1000 times) ...
Hash:   "o1gti"                            -> "27aa4cbd3653a4617e0aec76ba3af9a4"

The trick is, we throw away everything except the first input “random” and the final hash “27aa4cbd3653a4617e0aec76ba3af9a4”! This is the only part we need to store.

Using a rainbow table

How can we use the data above to reverse hashes you might ask? To use this table we need the input hash. For example “b9322e367ad002d5adf7ca60b8b61e86”. First we check if this hash is in the database. If it is, we are very lucky, we can just re-generate that particulair chain and we know the plain text input!

If we don’t find anything (which is very likely) we apply our reduce function to the input-hash and then hash that result. Now we check the hashes again, regenerate the chain and find out the answer. This can be repeated until we hit our set limit (1000) in that case, if no match has been found, we can’t reverse it.

False alarms

There is one problem in this algorithm. If you take a input-hash and then do a couple of reduce/hashes, and then you find a match… it might be a false alarm! The problem is that there might be input strings that result in the same hash. If this happens two chains will end up into one chain.

If we would have done a couple of reduce/hashes from our input and find a match for endpoint “52cafa6b5e4a6509e6ed2b8e6976d780”, the original chain might not have contained our input value. When we construct the complete chain it is possible that our input-hash isn’t in the chain…!

The ‘rainbow’ of the rainbow table

The reason they call it a rainbow table has something to do with reducing the amount of false alarms. Until now we’ve talked about having one reduce-function. What if we would have multiple reduce functions? Then we could create multiple small tables, which would help reduce a little bit.

Philippe Oechslin had a great idea. He used a different reduce algorithm for each step in the chain. So his tables are build using:

input chain 1 -> hash -> reduce1 -> hash -> reduce2 -> .... -> reduceN -> hash
input chain 2 -> hash -> reduce1 -> hash -> reduce2 -> .... -> reduceN -> hash
input chain 3 -> hash -> reduce1 -> hash -> reduce2 -> .... -> reduceN -> hash

(If you color the reduce functions you’ll end up with a pretty rainbow pattern).

How would you reverse a hash using this method? Well, the first step is the same, check if your input-hash is present in the stored hashes.
Next we apply:

reduce1000 -> hash -> check
reduce999 -> hash -> reduce1000 -> hash -> check
reduce998 -> hash -> reduce999 -> hash -> reduce1000 -> hash -> check

(etc)

The big advantage is that similair hashes (collisions) will most likely use different reduce algorithms so they won’t end up in the same chain.

Conclusion

Using rainbow tables you greatly decrease the amount of stored values. It isn’t log(O), there is a bit of computation needed to do the lookups, but that as well is kept to a minimum. This results in a very fast method to crack passwords. But I think it can be used in other fields of computer science as well. There are a lot of situations where you’d like to have a very big hash-lookup table, and when it becomes too big, this can be used to reduce storage but maintain fairly good lookup times.

Salting

Almost every article about hashing and rainbow tables end with a short alinea about salting. You can do salting in a couple of different ways, but the idea is usually the same. The easiest form of salting is having on ‘salt’ for a complete database.

One salt per database

How does this work? Well, lets just generate a completely random piece of text: “thisisoursaltanditisverylarge”. Now every time we store a new user we do “MD5(password + salt)”. Because the password itself may be weak, we apply our large “database-salt” to it, and then we calculate the hash.

Now if you want to crack a hash in this system it is almost impossible. Unless you find out the salt, then you could re-create a complete rainbow table and crack all the passwords.

Using a user value as salt

An even better solution is to use a user-value as salt, for example their username or date of birth, or maybe their registration date/time. Now if somebody cracks the database and finds all the data they’ll have to create a new rainbow table for each seperate user (!!!). This is even more secure and preferred over the database-salt.

Just generate a random sequence…

But the single best way of salting your database is to generate a large random salt for each user. You can just store this salt in the database next to the hash of the password. This is better then, for example, the username, because there are just less collisions. For example usernames like “root” or “admin” aren’t very uncommon aren’t they? So creating a rainbow table with “root” as salt might be worth the trouble. But creating one with a large random number just for a single user? That is hard and they’ll probably quickly give up.

Other uses for this algorithm

I haven’t been able to come up with a good other use for this algorithm yet, but I have the feeling tons of problems could potentially benefit from it! Can you come up with one?