# Practice Exam Solutions for Second Midterm

#### 6 November 2006

Question text is in black, solutions are in blue.

```  Q1: 10 points
Q2: 10 points
Q3: 20 points
Q4: 20 points
Q5: 25 points
Q6: 15 points
Total: 100 points
```

• Question 1 (10):

True or false with justification: Let A be an algorithm that operates on a binary tree, without changing the tree, as follows: A first spends O(1) time processing the root node of the tree, then recursively calls A on the left and the right subtrees. Then A can be substantially sped up on general binary trees using dynamic programming.

FALSE. Dynamic programming exploits overlap between the subproblems of a problem. Here the subproblems that are identified through the recursive calls involve processing subtrees of the tree, and different subtrees of the same tree do not overlap unless one is a subtree of the other. Memoization will not help at all as the algorithm will be called only once on each subtree.

• Question 2 (10):

True or false with justification: Let k be any positive integer constant greater than 1 . Then the recurrence T(n) = kT(n/k) + O(n), with T(1) = O(1), has the same big-O solution for any such k. (You may assume that T(n) is evaluated only when n is a power of k.)

TRUE. We are familiar with the Mergesort recurrence, with k=2, and this results in T(n) = O(n log n). For any integer constant k, we will have:

• T(n) = kT(n/k) + O(n)
• T(n) ≤ kT(n/k) + cn (for some constant c)
• T(n) ≤ k[kT(n/k2) + cn/k] + cn
• T(n) ≤ k[k[kT(n/k3) + cn/k2] + cn/k] + cn
• ...
• T(n) ≤ klogknT(1) + cn[1 + k/k + k2/k2 + ...]
• T(n) ≤ O(n) + logkn copies of cn
• T(n) = O(n log n)

As long as k > 1 the recurrence will stop this way, and as long as k is O(1), logkn = Θ(log n).

• Question 3 (20):

Consider the following Java method (assume that it will only be called with 0 ≤ k and k ≤ n):

``````
int choose (int n, int k) {
if ((k == 0) || (k == n)) return 1;
return choose (n-1, k) + choose (n-1, k-1);}
``````

Explain why the running time will not be polynomial in n in general. Describe (in code or in English) how to revise the algorithm to be polynomial time in n. Carefully determine and justify a big-O bound on the running time of your improved version.

The recursion will call the method `choose` on pairs of arguments (i,j) with i ≤ n and j ≤ k, once it is originally called on (n,k). But there will be multiple calls to the same pairs of arguments -- for example, if n is even and the original call is to (n,n/2), then both the subcalls to (n-1,n/2) and (n-1,n/2-1) will result in separate calls to (n-2,n/2-1). Each of these two calls will result in two separate calls to (n-4,n/2-2), making four calls with those arguments. Similarly we get (at least) eight total calls to (n-6,n/2-3), 16 calls to (n-8,n/2-4), and eventually 2n/2 calls to (0,0) just from calls to (2,1).

Memoization would make only one call to each of these O(n2) pairs of arguments, and the processing of each pair of arguments requires only O(1) time if we exclude the time for the recursive calls. Thus the total time for the memoized algorithm (which records the value for each pair of arguments in a table the first time it is computed) is O(n2).

Similarly, we could fill out the table of results without recursion as follows:

``````
int dpChoose (int n, int k) {
int[] table = new int[n+1,k+1];
for (int i=0; i < n+1; i++) {
table [i,0] = 1;
if (i <= k) table [i,i] = 1;}
for (int j=0; j <= k; j++)
for (int m=j+1; m <= n; m++)
table [m,j] = table [m-1,j-1] + table [m-1,j];
return table [n,k];}
``````

This code's running time is dominated by the two nested loops on lines 6 and 7, which take O(n2) time.

• Question 4 (20):

Let G be a directed graph and let s1,...,sk and t1,...,tk be any 2k distinct vertices of G. We want to know whether there exist k paths P1,...,Pk such that:

1. Each Pi starts at si and ends at ti (Whoops! This is a mistake -- I meant to say that each path starts at a different si, and ends at a different tj, but I didn't mean to say that Pi had to connect si to ti. The problem I assigned is solvable in polynomial time (it's a special case of "multicommodity flow"), but it's too hard for an exam.), and
2. No edge of G appears in more than one of the paths Pi.

Describe an algorithm to solve this problem in a time that is polynomial in m, the number of edges in G. Determine the big-O running time of your algorithm in terms of m, n (the number of vertices in G) and k (the number of paths to be found).

Here's the solution for the problem I meant to assign. Take the graph, make a new source s and a new sink t, connect s to each of the nodes si, connect each of the nodes ti to t, and add loops at any other sources or sinks so that s and t are the only source and the only sink. Give every edge a capacity of 1.

We can prove that there is a flow of k through this new graph if and only if the desired k edge-disjoint paths exist in the original graph. First, note that any integer flow in this graph must have unit flow over each edge in some subset, and this subset must include all k edges out of s and all k edges into t. Every other node must have the same number of unit-flow edges in and out of it. We can prove by induction on j that such a set of edges must consist of j disjoint paths from j of the si's to j of the sj's. For j=0 there are no edges and these form zero edge-disjoint paths. For the case of j, consider any si with an edge out of it and follow a path of unit-flow edges starting there. This path can only stop at t, so it goes through one of the ti's. If we delete this path, there is still a flow of j-1 from s to t, so by the inductive hypothesis there are j-1 paths that union together to form the rest of the edges. The new path we just constructed makes j paths, completing the induction.

Finally, we need to show that if the k edge-disjoint paths exist in the old graph, the flow of size k exists in the new graph. If we place a unit flow on each edge of each path, and on the 2k edges out of s and into t, we have a flow of size k -- it meets the conservation constraints because there are an equal number of edges into and out of each vertex other than s or t.

• Question 5 (25):

Consider the following algorithm strangeSort, which sorts n `Comparable` items in a list A. I should have added the assumption that the n items are all distinct. Otherwise the algorithm below fails to terminate if given a list of two or more items that are all equal.:

1. If n ≤ 1, return A unchanged
2. For each item x in A, scan A and count how many other items in A are less than x
3. Put the items with counts less than n/2 in a list B
4. Put the other items in a list C
5. Recursively sort B and C using strangeSort
6. Append the sorted C to the sorted B and return the result

• (a,10) Prove by induction on n that strangeSort correctly sorts all lists of length n, with smaller items first.

Clearly if n ≤ 1 the list is already sorted, and thus line 1 returns the sorted version of the list as desired. Assume now that strangeSort sorts all lists of length n/2, with smaller items first. Consider the operation of strangeSort on a list of size n. In line 2, each item is assigned a number from 0 through n-1, and in line 3 the smallest n/2 items are put in B. The two lists B and C thus each have n/2 items, so by the inductive hypothesis the recursive calls sort B and C correctly. Since every item in B is smaller than every item in C, the append operation in line 6 creates a sorted list of n elements. So the inductive step is complete, assuming that n is a power of 2.

• (b,15) Formulate a recurrence for the running time T(n) of strangeSort on an input list of size n. Solve this recurrence to get the best possible big-O bound on T(n) -- you may assume if you like that n is a power of 2.

Line 1 means that T(1) = O(1). Step 2 requires n scans of the entire list of n elements and so takes O(n2) time. Lines 3 and 4 each move n/2 items and so take O(n) time. Line 5 makes two calls to strangeSort with arguments of size n/2 and so takes time 2T(n/2). Line 6 also takes O(n) time. The total time is thus 2T(n/2) + O(n2) + O(n) = 2T(n/2) + O(n2).

This recurrence was solved to T(n) = O(n2) in the book -- we repeat the derivation here:

• T(n) = 2T(n/2) + O(n2)
• T(n) ≤ 2T(n/2) + cn2 (for some c)
• T(n) ≤ 4T(n/4) + 2c(n/2)2 + cn2
• T(n) ≤ 8T(n/8) + 4c(n/4)2 + 2c(n/2)2 + cn2
• ...
• T(n) &le nT(1) + cn2[1/n + 2/n + ... + 1/4 + 1/2 + 1]
• T(n) ≤ O(n) + 2cn2
• T(n) = O(n2)

• Question 6 (15):

Let G be a directed graph with exactly one source s, exactly one sink t, and a positive integer capacity on each edge. Explain carefully why we know that there is a maximum-size flow in G from s to t that sends an integer flow over each edge of G.

We have proved that the Ford-Fulkerson algorithm, which finds augmenting paths, sends the maximum possible augmenting flow across them, and then recalculates the residual graph, achieves a maximum flow in any diagram with positive integer capacities. (It adds an integer flow on each phase, and after at most C phases, where C is the capacity out of s, it must reach a point where the residual graph has no path from s to t, meaning that there is a cut of the graph that is saturated.)

By induction on the number of phases, the edge labels in the residual graph are always integers. This is because they start out as the integer capacities of the graph, and the flows added on the augmenting paths are always equal to one of the edge labels in the previous residual graph. Adding this integer flow can only change the residual graph edge labels by an integer, so they stay integers. At the end of the algorithm, the flow over each edge is the difference between the label of that edge in the residual graph and the capacity of the edge -- since this is the difference of two integers, it is an integer.