This section of the course has been dealing with combinatorics, the mathematics of counting problems. In this lecture we look at an application of combinatorics to computer security.
A password is a string that convinces a computer system that a particular user is authorized. Perhaps the most common way that users get unauthorized access to a computer is by discovering and employing (cracking) the password of some authorized user. Many users choose passwords that are easily guessed, like their own name or "12345".
Computer systems generally defend against the most obvious attacks by locking out someone who is making repeated attempts to access them with incorrect passwords. However, the system needs to somewhere keep a record of which passwords are correct, and this record could be compromised by an attacker. To make the record less useful, systems typically store not the password p itself but a cryptographic hash h(p), the result of some hard-to-invert function applied to it. When an authorized user enters their password p, the system computes h(p) and sees whether it is on the list. Even knowing a valid h(p), the attacker does not have an obvious way to compute p and enter it to gain access.
However, the attacker can now make a brute force attack by taking a large number of candidate passwords, computing the hash of each one, and seeing whether any is on the list of accepted hashes. To foil this attack, the authorized user needs to have a password that is not on any list that the attacker can search within a reasonable time.
We thus measure the strength of a password by the size of a list of candidates that the attacker would need to generate to include it. The attacker will start with words in the dictionary, or passwords known to have been used in the past. These might number, for example, in the hundreds of thousands. One estimate of the rate at which an attacker can try passwords is ten million per second, so these passwords are quite weak.
What about using a randomly chosen eight-letter string from the lower-case alphabet {a,..., z}? The list of all such strings, as we now know, has size 268, which is about 209 billion (2.09 times 1011). At ten million guesses per second, it would take about six hours to search this list.
Many systems, therefore, encourage or require users to use longer passwords, or passwords that contain upper-case letters, digits, or punctuation along with lower-case letters. The theory behind this last idea is that a random string from a larger alphabet comes from a set that would take longer to search. For example, if we choose an eight-letter over an 80-character alphabet at random, the attacker would have to search a list of size 808 = about 1.68 times 1015. Guessing ten million of these per second would take the attacker 168 million seconds, or about five years.
The problem with this idea, as Randall Munroe points out in the xkcd cartoon I distributed in class, is that users do not choose passwords at random, because such passwords are too hard to remember. A common method of getting a password that meets the not-all-lower-case requirement is to take a relatively obscure English word and then alter it, for example by replacing "i"'s with "1"'s, or "a"'s with "@" signs. Randall calculates that one system along these lines produces about 228 possible passwords, which is less than a billion. And these passwords are not that easy to remember, though they are easier than randomly-generated strings.
Randall suggests taking a sequence of four common English words, such as "correct horse battery staple". With a word list of 211 = 2048, this gives 244 or about 1.76 times 1013 passwords, which would take our attacker 1.76 million seconds or about three weeks. (Randall argues that most users should be more worried about an attacker who could guess only 1000 keys per second.) Such a phrase, he argues, is easy to memorize because you can make up a story about it after the fact, and people remember stories more easily than they remember arbitrary strings of characters.
A few years ago I gave a talk on passwords in the UMass Theory Seminar, based on a talk by Manuel Blum, one of the giants of theoretical computer science and my Ph.D. advisor's Ph.D advisor. Here are my slides for that talk -- much of it should make sense to you as I had a general audience in mind. Blum presented a method whereby a human could memorize a set of sentences or stories, and use this to generate a series of passwords, one for each account they have, and all of them reasonable hard to crack even if the adversary knows exactly how you are generating them.
Last modified 6 November 2017