We'll see in today's lecture that there are n! possible permutations of an
n-element set. A **sorting algorithm** must determine which of these
possible orders the input is in. A **comparison-based** sorting
algorithm must decide this on the basis of comparisons between pairs of
elements. We can represent a comparison-based sorting algorithm by a
**decision tree**. I'll draw on the board a depth-1 decision tree that
decides the order of two elements, and a depth-3 tree that decides the order
of three elements. Each internal node of the tree represents a comparison
between two of the elements, and each leaf node represents an inferred order
of the elements. For n=2, we compare a against b and thus determine whether
the true order is ab or ba. For n=3, we can compare a against b. If a comes
first, the real order might be abc, acb, or cab. We can test b against c, then
test a against c if c comes first. The other half of the decision tree is
similar.

A tree of depth d can have at most 2^{d} leaves, and a correct
tree sorting n elements must have at least n! leaves. Comparing these two
numbers gives us the **Sorting Lower Bound Theorem**, which says that
any comparison-based sorting algorithm takes at least log_{2}(n!)
comparisons in the worst case.

**Writing Exercise 1:** For n=4 we must use at least five
comparisons because log_{2}(24) is a real number between 4 and 5.
Describe a decision tree of depth five that is correct for sorting four
elements a, b, c, and d. (Note that there are six possible comparisons, so you
will always reach your conclusion despite there being at least one pair you
have not compared directly.

**Solution:** Begin by first comparing a against b,
then c against d. Then compare the winners of these two comparisons for your
third comparison. You have now identified the first element. For your fourth
comparison, compare the two elements that lost directly to the first element --
the winner is the second element. Finally, compare the last two if they have
not already been compared -- this distinguishes the third and fourth elements
and determines the order in at most five comparisons.

More specfically, consider the branch of the tree where in the first three comparisons, a defeats b, c defeats d, and a defeats c. There are three orders that can make this happen: abcd, acbd, and acdb. For this branch of the tree, have the fourth comparison test b against c. If b wins we can conclude that abcd was the order, and otherwise test b against d for the fifth comparison.

This describes one eighth of the tree. The other seven eighths of the tree can be obtained from this first eighth by simple substitutions. To get the subtree where c defeats a in the third comparison, take the subtree described above and switch a with c and also b with d. You now have a subtree giving one quarter of the entire tree. To get the subtree where d defeats c in the second comparison, take the first quarter and swap c and d. This now gives you half the tree. To get the other half, take the first half and swap a with b.

There are at least two other ways to determine the order in at most five
comparisons. One is **mergesort** -- test a against b and c against d
to get two sorted lists of two elements, then merge the two lists with at most
three more comparisons. Another method is to sort a, b, and c using either two
or three comparisons as I did on the board, then use two more comparisons to
find out where d fits into the list. (If the order of a, b, and c is abc, for
example, test b against d, then against a if it wins or c if it loses.)

**Writing Exercise 2:** For n=5 the lower bound is seven comparisons
because log_{2}(120) is a real number a little smaller than 7. Describe
a decision tree of depth seven that is correct for sorting five elements a, b,
c, d, and e. (Hint: For each node you can associate the set of orders of the
elements that reach that node. To get the shallowest tree you want the new
comparison to divide this set as nearly in half as possible. I'll illustrate
this on the board for the sample trees with n=2 and n=3.)

We begin with the same first three comparisons as before: a against b, c against d, and winner against winner. Each of these comparisons divides the set of permutations reaching it exactly in half, from the original 120 to 60, 30, and 15. Here are the fifteen comparisons for which a defeats b, c defeats d, and a defeats c:

abcde, abced, abecd, aebcd, eabcd, acbde, acbed, acebd, aecbd, eacbd, acdbe, acdeb, acedb, aecdb, and eacdb.

We will create a tree of depth four to distinguish among these fifteen comparisons. Just as in the n=4 case, this will give one eighth of the three and we can get the other seven eighths by substitutions.

We need to find a comparison that splits these fifteen orders into a group of eight and a group of seven. The only one that works is c against e:

c first: abcde, abced, acbde, acbed, acebd, acdbe, acdeb, acedb

e first: abecd, aebcd, eabcd, aecbd, eacbd, aecdb, eacdb

In the c-first case, we can split these eight into two groups of four by testing d against e. We can then split the d-first group of these (abcde, acbde, acdbe, acdeb) by testing b against d, and split the e-first group (abced, acbed, acebd, acedb) by testing b against e. Once we have reduced the possibilities to two, there must be a final comparison distinguishing any two different comparisons.

So all we need is to deal with the e-first case. We split the seven orders into a group of four and a group of three by testing b against c. Then we split the b-first subset (abecd, aebcd, eabcd) by testing a against e, and split the c-first subset (aecbd, eacbd, aecdb, eacdb) also by testing a agaisnt e. Once again, the final comparison is clear except in the case of eabcd, which we identify in only six comparisons.

Last modified 6 December 2007