We'll see in today's lecture that there are n! possible permutations of an n-element set. A sorting algorithm must determine which of these possible orders the input is in. A comparison-based sorting algorithm must decide this on the basis of comparisons between pairs of elements. We can represent a comparison-based sorting algorithm by a decision tree. I'll draw on the board a depth-1 decision tree that decides the order of two elements, and a depth-3 tree that decides the order of three elements. Each internal node of the tree represents a comparison between two of the elements, and each leaf node represents an inferred order of the elements. For n=2, we compare a against b and thus determine whether the true order is ab or ba. For n=3, we can compare a against b. If a comes first, the real order might be abc, acb, or cab. We can test b against c, then test a against c if c comes first. The other half of the decision tree is similar.
A tree of depth d can have at most 2d leaves, and a correct tree sorting n elements must have at least n! leaves. Comparing these two numbers gives us the Sorting Lower Bound Theorem, which says that any comparison-based sorting algorithm takes at least log2(n!) comparisons in the worst case.
Writing Exercise 1: For n=4 we must use at least five comparisons because log2(24) is a real number between 4 and 5. Describe a decision tree of depth five that is correct for sorting four elements a, b, c, and d. (Note that there are six possible comparisons, so you will always reach your conclusion despite there being at least one pair you have not compared directly.
Solution: Begin by first comparing a against b,
then c against d. Then compare the winners of these two comparisons for your
third comparison. You have now identified the first element. For your fourth
comparison, compare the two elements that lost directly to the first element --
the winner is the second element. Finally, compare the last two if they have
not already been compared -- this distinguishes the third and fourth elements
and determines the order in at most five comparisons.
More specfically, consider the branch of the tree where in the first three
comparisons, a defeats b, c defeats d, and a defeats c. There are three orders
that can make this happen: abcd, acbd, and acdb. For this branch of the tree,
have the fourth comparison test b against c. If b wins we can conclude that
abcd was the order, and otherwise test b against d for the fifth comparison.
This describes one eighth of the tree. The other seven eighths of the tree
can be obtained from this first eighth by simple substitutions. To get the
subtree where c defeats a in the third comparison, take the subtree described
above and switch a with c and also b with d. You now have a subtree giving
one quarter of the entire tree. To get the subtree where d defeats c in the
second comparison, take the first quarter and swap c and d. This now gives you
half the tree. To get the other half, take the first half and swap a with b.
There are at least two other ways to determine the order in at most five
comparisons. One is mergesort -- test a against b and c against d
to get two sorted lists of two elements, then merge the two lists with at most
three more comparisons. Another method is to sort a, b, and c using either two
or three comparisons as I did on the board, then use two more comparisons to
find out where d fits into the list. (If the order of a, b, and c is abc, for
example, test b against d, then against a if it wins or c if it loses.)
Writing Exercise 2: For n=5 the lower bound is seven comparisons because log2(120) is a real number a little smaller than 7. Describe a decision tree of depth seven that is correct for sorting five elements a, b, c, d, and e. (Hint: For each node you can associate the set of orders of the elements that reach that node. To get the shallowest tree you want the new comparison to divide this set as nearly in half as possible. I'll illustrate this on the board for the sample trees with n=2 and n=3.)
We begin with the same first three comparisons as before: a against b, c
against d, and winner against winner. Each of these comparisons divides the
set of permutations reaching it exactly in half, from the original 120 to
60, 30, and 15. Here are the fifteen comparisons for which a defeats b, c
defeats d, and a defeats c:
abcde, abced, abecd, aebcd, eabcd, acbde, acbed, acebd, aecbd, eacbd,
acdbe, acdeb, acedb, aecdb, and eacdb.
We will create a tree of depth four to distinguish among these fifteen
comparisons. Just as in the n=4 case, this will give one eighth of the three
and we can get the other seven eighths by substitutions.
We need to find a comparison that splits these fifteen orders into a group
of eight and a group of seven. The only one that works is c against e:
c first: abcde, abced, acbde, acbed, acebd, acdbe, acdeb, acedb
e first: abecd, aebcd, eabcd, aecbd, eacbd, aecdb, eacdb
In the c-first case, we can split these eight into two groups of four
by testing d against e. We can then split the d-first group of these
(abcde, acbde, acdbe, acdeb) by testing b against d, and split the e-first
group (abced, acbed, acebd, acedb) by testing b against e. Once we have
reduced the possibilities to two, there must be a final comparison
distinguishing any two different comparisons.
So all we need is to deal with the e-first case. We split the seven orders
into a group of four and a group of three by testing b against c. Then we
split the b-first subset (abecd, aebcd, eabcd) by testing a against e, and
split the c-first subset (aecbd, eacbd, aecdb, eacdb) also by testing a
agaisnt e. Once again, the final comparison is clear except in the case of
eabcd, which we identify in only six comparisons.
Last modified 6 December 2007