CMPSCI 250 Discussion #9: The Problem of Sorting

David Mix Barrington

7 November 2007

We'll see in today's lecture that there are n! possible permutations of an n-element set. A sorting algorithm must determine which of these possible orders the input is in. A comparison-based sorting algorithm must decide this on the basis of comparisons between pairs of elements. We can represent a comparison-based sorting algorithm by a decision tree. I'll draw on the board a depth-1 decision tree that decides the order of two elements, and a depth-3 tree that decides the order of three elements. Each internal node of the tree represents a comparison between two of the elements, and each leaf node represents an inferred order of the elements. For n=2, we compare a against b and thus determine whether the true order is ab or ba. For n=3, we can compare a against b. If a comes first, the real order might be abc, acb, or cab. We can test b against c, then test a against c if c comes first. The other half of the decision tree is similar.

A tree of depth d can have at most 2d leaves, and a correct tree sorting n elements must have at least n! leaves. Comparing these two numbers gives us the Sorting Lower Bound Theorem, which says that any comparison-based sorting algorithm takes at least log2(n!) comparisons in the worst case.

Writing Exercise 1: For n=4 we must use at least five comparisons because log2(24) is a real number between 4 and 5. Describe a decision tree of depth five that is correct for sorting four elements a, b, c, and d. (Note that there are six possible comparisons, so you will always reach your conclusion despite there being at least one pair you have not compared directly.

Solution: Begin by first comparing a against b, then c against d. Then compare the winners of these two comparisons for your third comparison. You have now identified the first element. For your fourth comparison, compare the two elements that lost directly to the first element -- the winner is the second element. Finally, compare the last two if they have not already been compared -- this distinguishes the third and fourth elements and determines the order in at most five comparisons.

More specfically, consider the branch of the tree where in the first three comparisons, a defeats b, c defeats d, and a defeats c. There are three orders that can make this happen: abcd, acbd, and acdb. For this branch of the tree, have the fourth comparison test b against c. If b wins we can conclude that abcd was the order, and otherwise test b against d for the fifth comparison.

This describes one eighth of the tree. The other seven eighths of the tree can be obtained from this first eighth by simple substitutions. To get the subtree where c defeats a in the third comparison, take the subtree described above and switch a with c and also b with d. You now have a subtree giving one quarter of the entire tree. To get the subtree where d defeats c in the second comparison, take the first quarter and swap c and d. This now gives you half the tree. To get the other half, take the first half and swap a with b.

There are at least two other ways to determine the order in at most five comparisons. One is mergesort -- test a against b and c against d to get two sorted lists of two elements, then merge the two lists with at most three more comparisons. Another method is to sort a, b, and c using either two or three comparisons as I did on the board, then use two more comparisons to find out where d fits into the list. (If the order of a, b, and c is abc, for example, test b against d, then against a if it wins or c if it loses.)

Writing Exercise 2: For n=5 the lower bound is seven comparisons because log2(120) is a real number a little smaller than 7. Describe a decision tree of depth seven that is correct for sorting five elements a, b, c, d, and e. (Hint: For each node you can associate the set of orders of the elements that reach that node. To get the shallowest tree you want the new comparison to divide this set as nearly in half as possible. I'll illustrate this on the board for the sample trees with n=2 and n=3.)

We begin with the same first three comparisons as before: a against b, c against d, and winner against winner. Each of these comparisons divides the set of permutations reaching it exactly in half, from the original 120 to 60, 30, and 15. Here are the fifteen comparisons for which a defeats b, c defeats d, and a defeats c:

abcde, abced, abecd, aebcd, eabcd, acbde, acbed, acebd, aecbd, eacbd, acdbe, acdeb, acedb, aecdb, and eacdb.

We will create a tree of depth four to distinguish among these fifteen comparisons. Just as in the n=4 case, this will give one eighth of the three and we can get the other seven eighths by substitutions.

We need to find a comparison that splits these fifteen orders into a group of eight and a group of seven. The only one that works is c against e:

c first: abcde, abced, acbde, acbed, acebd, acdbe, acdeb, acedb

e first: abecd, aebcd, eabcd, aecbd, eacbd, aecdb, eacdb

In the c-first case, we can split these eight into two groups of four by testing d against e. We can then split the d-first group of these (abcde, acbde, acdbe, acdeb) by testing b against d, and split the e-first group (abced, acbed, acebd, acedb) by testing b against e. Once we have reduced the possibilities to two, there must be a final comparison distinguishing any two different comparisons.

So all we need is to deal with the e-first case. We split the seven orders into a group of four and a group of three by testing b against c. Then we split the b-first subset (abecd, aebcd, eabcd) by testing a against e, and split the c-first subset (aecbd, eacbd, aecdb, eacdb) also by testing a agaisnt e. Once again, the final comparison is clear except in the case of eabcd, which we identify in only six comparisons.

Last modified 6 December 2007