Second Midterm Exam Solutions

22 November 2006

Question text is in black, solutions in blue.

Directions:

• Answer the problems on the exam pages.
• There are six problems for 100 total points. Actual scale was A = 84, C = 48.
• No books, notes, calculators, or collaboration.
• The exam had a time limit of 120 minutes, though it was not intended that you would need all the time.
• Questions 1 and 2 are "true/false with explanation" -- you get five points for a correct boolean answer, and up to five additional points for a convincing justification.

```  Q1: 10 points
Q2: 10 points
Q3: 20 points
Q4: 20 points
Q5: 20 points
Q6: 20 points
Total: 100 points
```

• Question 1 (10): True or false with justification: Let L be a list of n items, all distinct, and let k be an integer with k < n. Our goal is to find the item in L that has exactly k items in L less than it. But the only method we have available is called `Split`: Given a list of m items it returns two lists, one with the smallest floor(m/2) (Java "m/2") items in the input (in some unknown order) and the other list with the rest.

If `Split` takes O(n) time on a list of size m, then we can create a list containing only our desired item in O(n) time, no matter what k is.

TRUE. Once we split the list in two, we know which half contains the desired element. If k < m/2 it is the element of the lower half with k elements less than it, and if k ≥ m/2 it is the element of the upper half with k - m/2 elements less than it. So we get a recurrence of T(n) = T(n/2) + O(n), which solves to T(n) = O(n) because substitution gives us T(n) ≤ cn + cn/2 + cn/4 + ... ≤ 2cn. Some of you said the the O(log n) splits took O(n) each rather than O(m), this leads to a correct but not optimal bound of O(n log n). Getting a bound of O(n log n) does not prove that a bound of O(n) is false!

• Question 2 (10): True or false with justification: Let G be a network flow diagram, let f be a valid flow in G, and let Δ be a positive constant. Define Gf(Δ) to be the graph consisting of those edges in the residual graph Gf that have labels of at least Δ. If there is no path in Gf(Δ) from s to t, then the size of f is within Δ of the maximum possible flow in G.

FALSE. The statement would be true if it said "within mΔ" rather than "within Δ", but it's not enough to observe this -- we have to find an example where the stated claim fails. The point is that there may be more than one edge of the residual graph showing the potential of an extra flow less than Δ, and it might be possible to add all these flows. A simple counterexample to the claim is a network with four nodes s, a, b, and t, edges (s,a), (s,b), (a,t), and (b,t) each of capacity 2, a zero flow for f, and Δ = 3. The residual graph has four edges of capacity 2 and so Gf

• Question 3 (20): Let L and ε be two real numbers such that L = 2kε, where k is a non-negative integer. We have an L by L square region of desert, enclosed by a fence, in which we know that there is a lion. The following recursive algorithm R(L) will allow us to capture the lion:
• If L = ε, capture the lion by placing a cage over the entire fenced area. Otherwise,
• Build a fence of length L bisecting the region from north to south.
• Determine which half of the region contains the lion.
• Build a fence of length L/2 bisecting that half-region from east to west.
• Apply the algorithm R(L/2) to the enclosed quarter-region containing the lion.

Write and solve a recurrence to determine the exact (not big-O) length of fence required to capture the lion with algorithm R(L). Do not count the fence enclosing the entire region at the beginning. (Hint: It may help to first solve the problem for L = ε, L = 2ε, and L = 4ε.)

The intended recurrence for the amount of fence is F(L) = 3L/2 + F(L/2), with a base case of F(ε) = 0 (since no more fence is needed in the case where L = ε). From this we can derive F(2ε) = 3ε, F(4ε) = 9ε, and F(8ε) = 21ε. If we are lucky or clever we may notice the pattern that F(L) = 3L - 3ε, and prove that this formula is true because it satisfies the base case and the inductive case of the recurrence: F(L) = 3L/2 + F(L/2) = 3L/2 + (3L/2 = 3ε) by the IH, which is 3L - 3ε as desired.

What if we aren't lucky or clever enough to see the pattern? We can derive:

• F(L) = 3L/2 + F(L/2)
• F(L) = 3L/2 + 3L/4 + F(L/4)
• F(L) = 3L/2 + 3L/4 + 3L/8 + F(L/8)
• ...
• F(L) = 3L/2 + ... + 3L/2k + F(L/2k)
• F(L) = 3L/2 + ... + 3ε + F(ε)
• F(L) = 3ε(2k-1 + ... + 1) + 0
• F(L) = 3ε(2k - 1) = 3L - 3ε

• Question 4 (20): Let w1, ..., wk be a set of words and let vi for each i be the positive point value of work wi. Given an input string z of length n, we want to find a set of non-overlapping occurrences of some of the wi's within z such that the total point value of the words in the set is as large as possible. (If {wi} were the set of all English words and z were "cattledog", for example, we would need to determine whether the value of "cat" plus "led" was greater than that of "cattle" plus "dog" or "cat" plus "dog".) (Note also that there is no restriction on using a word wi more than once as long as there is no overlap.)

Describe an algorithm that will determine the optimal set of words. Determine the big-O running time of your algorithm as a function of n and k, and verify that it is polynomial in them.

I intended this problem to be solved similarly to Weighted Interval Scheduling, but I didn't notice like some of you that it is WIS, viewed correctly. Here's the reduction -- first check for every i and j whether there is a copy of word wi starting at letter j of z. If there is, create a job for the schedule problem, starting at time j, ending at time j + length(wi) + 1, and having value vi. (Why the "+1"? If one word ends at letter k of z and another begins at letter k, we can't use both so we want the corresponding jobs to overlap.) Now we submit this WIS problem, with O(nm) jobs, to the WIS algorithm which solves it in time O(mn log(mn)) by dynamic programming. It takes us O(mn2) to do all the checks for inclusion of words in z, or O(mnq) if q is an upper bound on the length of the words wi. Actually we can do the whole problem in O(mn) because we don't really need the sort -- the occurrences come to us with their end times already computed, so it's easy to sort them by end time in O(mn) time.

If we don't notice this reduction, we can still solve the problem by dynamic programming. We let f(j) be the optimal score from the first j letters of z. If there is no instance of a scoring word ending at letter j, then f(j) = f(j-1). Otherwise, for each such word wi we calculate vi + f(j - length(wi)), find the maximum of each such total and f(j-1), and set f(j) to this maximum. This takes n phases, and in each phase we have to compare up to m values to find the maximum. Thus we have O(mn) time if we can do the string comparison operations to test for inclusion in O(1). More realistically, these take O(n) time each or O(q) if q is an upper bound on the size of the scoring words.

• Question 5 (20): The roads in the region including Point A and Point B are represented by an undirected graph with n nodes (including A and B) and m edges, where an edge means that a road may be traversed in either direction. An accident prevents drivers from using the road in either direction. Transportation planners want to know the smallest number of simultaneuous accidents which, in the worst case, could prevent all travel from Point A to Point B.

Describe an algorithm t determine this number, and determine its running time as a function of m and n. (I intended to insist that this running time be polynomial in m and n, but I didn't.) If you make use of a standard algorithm from the book, you need not describe it in detail but you must indicate how you are transforming your given input into an input suitable for that algorithm, and what you are doing with its output.

This problem is clearly similar to Network Flow -- the problem is to make the reduction precise. We make a network by setting s to A, setting t to B, making directed edges in each direction for each road, and then removing the edges into s and out of t so that s is a source and t is a sink. Each edge gets capacity 1. We find the size of the optimal flow in this network, and call this number k. This turns out to be our answer.

But the Network Flow problem says nothing about roads or accidents, so we need an argument that k is the correct number. We must show that k accidents can separate A from B, and that k-1 accidents cannot. The first is true because there must be a cut of size k in the network, and if we kill all k edges in that cut there cannot be a remaining path from A to B because this would be an augmenting path in the flow network for the maximum flow. If we have only k-1 accidents, there must still be a flow in the remaining graph. This is because if there were not, the set of nodes reachable from A and the set of nodes not reachable would form a cut, and since only the k-1 removed edges could go over the cut its size would be at most k-1, whereas we know that the minimum cut has size k.

The network flow graph has n nodes and O(m) edges (since there are fewer than 2m directed edges). The capacity out of s in the network flow graph is at most n-1 = O(n), since there is at most one edge of capacity 1 to each other vertex. So there are O(n) phases of Ford-Fulkerson, each involving a BFS of O(m) time, and the running time of the entire algorithm is O(mn).

• Question 6 (20): We are given a weighted directed graph G with n nodes and m edges, where the weights may be positive, negative, or zero. Describe (in some detail) an algorithm to determine whether a negative cycle exists in G. (Recall that a cycle in a directed graph is a directed simple path of one or more edges from some vertex to itself. A negative cycle is one where the sum of the weights of the edges in the path is negative.)

Determine the big-O running time of your algorithm in terms of m and n -- it should be polynomial in them.

We use the Bellman-Ford algorithm to compute the quantity C(x,y,i) for each vertex x, each vertex y, and each integer i with 0 ≤ i ≤ n, equal to the minimum cost of any path from x to y that uses i or fewer hops. We do this by dynamic programming in O(nm) time: For i=0 we know that the cost is 0 if x=y and infinite otherwise. For general i we set C(x,y,i) to the minimum of C(x,y,i-1) and the sum C(x,z,i-1) + L(z,y) for each edge (z,y) in G. (Here L(z,y) is the cost of the edge (z,y).) This gives us the correct value of all the C(x,y,i) in O(m) time from the values of all the C(x,y,i-1).

Now we must use this information to determine whether there is a negative cycle. The solution is that there is a negative cycle if and only if for at least one pair (x,y), we have C(x,y,n) < C(x,y,n-1). We must prove the "if and only if". If there is no negative cycle, we know that any path of n or more hops from x to y has a corresponding path with fewer hops from x to y and an equal or lower cost. So there cannot be an n-hop path that has less cost than all paths with n-1 or fewer hops. Conversely, if there is a negative cycle there is no single minimum-cost path between any two nodes on the cycle, because we can find a path with less cost by taking the negative cycle one more time. But if C(x,y,n-1) = C(x,y,n) for every x and every y, this means that an examination of all the edges has not found any improvement in cost, and this would keep happening as we looked at C(x,y,n+1), C(x,y,n+2), and so on forever. So there would be minimum-cost paths from each x to each y, contradicting the existence of a negative cycle.