# CMPSCI 311: Theory of Algorithms

### Solutions posted Thursday 18 December 2003

Questions are in black, solutions in blue.

• Question 1 (10): (true/false with justification) Let G be a flow diagram (a directed graph with edge capacities) and let k be a positive number. Then there is a flow through G of size k if and only if the edges out of the start node have capacities adding to at least k and the edges into the finish node have capacities adding to at least k.

FALSE. It is possible for there to be a bottleneck (a cut of size less than k) in the middle of the digram. For example, let k=2 and let the only three edges in the graph be (s,x,2), (x,y,1), and (y,t,2). There is capacity of 2 out of s and into t, but the maximum flow in the diagram is 1 because the sets {s,x} and {y,t} form a cut with capacity 1.

• Question 2 (10): (true/false with justification) Let T(n) be a function obeying the recurrence T(n) = 5T(n/5) + a with initial condition T(1) = b, where a and b are positive numbers. Then T(n) = Θ(n log n).

FALSE. By the Master Theorem T(n) = Θ(n). We compare nlog55 with the overhead term of a = O(n0) and find that we are in the case where the former dominates.

• Question 3 (10): (true/false with justification) Let N be an NP-procedure with the property that on inputs of size n, the number of possible different guesses that N can make in its guess phase is O(n3). Then the decision problem of N is in the class P. (Recall that this decision problem is take x as input and decide whether there exists a guess of N on input x that will cause its evaluate phase to return ``true''.)

TRUE. We can solve the decision problem deterministically in polynomial time as follows. Generate each of the O(n3) possible guesses, check each to see whether the verification phase says "yes" on it, and say yes iff it ever does. This is correct by the definition of the decision problem for N, and it is polynomial time because its total time is O(n3) times the polynomial time needed for each verification, and a polynomial times a polynomial is a polynomial.

• Question 4 (10): (true/false with justification) Let A be an array of n items, each of which is a character in the set {a,b,c,...,z}. Then A can be sorted in O(n) time.

TRUE. Since the items to be sorted come from a set of size 26 = O(1), the general sorting lower bound of O(n log n) does not apply. There are several linear-time methods to sort these characters. One is to make one pass through the input to count the number of letters of each type, then fill the output sequentially with the correct number of each letter.

• Question 5 (45): The problem MST-DECISION is to take as input a weighted undirected graph G (with n vertices) and a number k, and output ``true'' if and only if G has a spanning tree of weight at most k.

• (a,10) Describe an NP-procedure for this problem, proving that it is in the class NP.

In the guess phase we choose a set of n-1 edges of G. (Many of you said "guess a spanning tree", which would require somehow knowing that what you were guessing was actually a spanning tree.) In the verification phase we check whether these edges form a connected graph (DFS or BFS in O(n) time), and whether the sum of their weights is at most k. If both checks succeed we say "yes", otherwise we say "no". There exists a way to guess a set of edges making the verifier say "yes" if and only if the answer to the MST-DECISION instance should be "true".

• (b,20) Prove that MST-DECISION is in the class P. If you use a standard algorithm from the course, give enough detail to convince me that you know how and why it works.

The general idea is to use a poly-time algorithm to construct a minimum spanning tree, add up its weight, and return "true" if this weight is at most k. (There exists a spanning tree of weight at most k if and only if the MST's weight is at most k.)

The Prim algorithm maintains a spanning tree of a subset of the vertices (initially one arbitrary vertex) and a table giving for each other vertex the weight of the cheapest edge, if any, that connects it to the tree. We add a vertex to the tree by choosing the minimum entry of the table and then updating the table with any changes caused by other edges incident to the new vertex. After n-1 such adds (if we can make them, otherwise no spanning tree exists) we have a minimum spanning tree. The time is certainly polynomial, with the exam time depending on the data structures. If you spend O(n) to process the table for each edge you have O(ne), for example.

The Kruskal algorithm sorts the edges and keeps adding the cheapest edge that does not complete a cycle. This is easy to describe, but requires that you keep a union-find data structure to tell whether a new edge creates a cycle. (Of course if all you care about is polynomial time or not you can check this cycle with a DFS each time, and your time is still O(e2).)

• (c,15) Suppose that every edge of G has weight 1, and that the number of edges in G is e. Describe an algorithm that runs in O(e) time and produces a minimum spanning tree of G if any spanning tree exists. (If there is no spanning tree or if any edge has a weight other than 1, the algorithm should discover this. Also remember that there can be more than one minimum spanning tree if they are tied for the smallest total cost, and that in this case your algorithm may return any of them.)

In this case if there are any spanning trees at all (if the graph is connected) then each of them have the same weight n-1 because any spanning tree on n nodes has exactly n-1 edges. So we just have to find a spanning tree, and the simplest way is to do a DFS or BFS in O(n+e) time and return the tree edges of the search forest. Of course BFS or DFS also confirms the connectedness of the graph. A couple other nitpicks: we need to cover the possibility that e = o(n), and the possibility that the input contains an edge with weight other than 1. We can take care of both of these by scanning all the edges at the beginning, counting them and rejecting the input if there are fewer than n-1, and also rejecting if any has a bad weight.

• Question 6 (40): As in an exercise on Homework 9, King Arthur has a large number n of knights and some the knights are quarrelling with one another. Arthur has an undirected graph where the nodes represent the knights and there is an edge between nodes x and y if and only if knights x and y are quarreling with each other.

• (a,10) Arthur must select five of his knights for a quest, and does not want to have two knights on the quest quarrelling with each other. Indicate how, given his undirected graph as input and using time polynomial in n, he can decide whether this is possible.

We need a set of five nodes in the graph that are independent, that is, such that none of the ten possible nodes among them exist. To do this we must check all possible sets of five nodes in the worst case, which we can do with a program with five loops that takes time O(n5). Once we have a set we check the ten edges and return the set if all are missing.

• (b,20) Suppose that Arthur wants to decide, given a number k in the range from 0 to n, whether he can select k knights for a quest without including a quarrelling pair. Prove that this problem is NP-complete, where the input is considered to be both his graph and k. This means you must prove that his problem is in the class NP and reduce a known NP-complete problem to it.

To see that this problem is in NP we must describe an NP-procedure for it. The procedure guesses a set of nodes and then verifies that (a) there are exactly k nodes, and (b) none of the (k choose 2) possible edges among these nodes exist. The verification takes O(n2) time in the worst case when we have to look at all the edges.

To see that King Arthur's problem is NP-complete we must also reduce a known NP-complete problem to it. (We don't reduce it to a known NP-complete problem, as that only shows again that it is in NP. To show his problem hard, we need to show that it can be used to do something hard, not that something hard can be used to do it.)

If we take IND-SET from the exercises as our known NP-complete problem, our reduction is actually the identity function. Given the quarrelling graph G and a number k, there is a non-quarrelling set of k knights if and only if the input (G,k) is a "yes" instance for the problem IND-SET. So we have the reduction by mapping (G,k) to G and k.

To reduce CLIQUE to King Arthur's problem, we map (G,k) to the graph H and the same number k, where H is the complement of G (same vertices, edge present if and only if no edge is present in G). Then there is a clique of size k in G iff there is a non-quarrelling set of k knights in H.

• (c,10) Now assume that each knight is quarreling with at most one other knight. Show that Arthur can find at least n/2 knights to go on the quest. Show that in the worst case, if n is even, he cannot do better than this.

With this restriction the quarrelling graph must consist of some number of pairs of vertices connected by an edge, plus some number of isolated nodes. (If knight x quarrels with y, then y quarrels with x and neither may quarrel with any other knight.) The worst case with even n is when there are n/2 pairs and no isolated nodes. In this case Arthur may select n/2 knights by taking one from each pair. (In the general case he can always get at least n/2 by taking one from each pair and including all the isolated nodes.) But n/2 is the best he can do because if he picks more than that he must include both members of some pair. (If the average number of knights he picks in a pair is greater than one, some pair must have two knights picked.)

• Question 7 (25): Given a non-negative integer n, we want to determine how many strings of 1's, 2's, and 3's add up to n. For example, with n=3 we have the strings 111, 12, 21, and 3 -- four of them. With n=5 we have thirteen: 11111, 1112, 1121, 113, 1211, 122, 131, 2111, 212, 221, 23, 311, and 32.

• (a,10) Let T(n) be the number of such strings adding to n. Give a recurrence for T(n) when n is positive, based on the fact that the first digit in the string may be 1, 2, or 3. What is T(0)?

The base case T(0) = 1 follows from the fact that there is exactly one string of length 0, and its digits add to 0. In the general case, a non-empty string adding to a positive number n may be (a) a 1 followed by a string adding to n-1, (b) a 2 followed by a string adding to n-2, or (c) a 3 followed by a string adding to n-3. Thus the total number T(n) of strings adding to n is exactly T(n-1) + T(n-2) + T(n-3).

To complete the recurrence we need three consecutive base cases. Along with T(0) = 1 we could include T(-1) = 0 and T(-2) = 0, because a string cannot add to a negative number. More conventionally, we could observe directly that T(1) = 1 and T(2) = 2, and only use the recurrence for n at least 3.

• (b,15) Describe an efficient algorithm to compute T(n). What is its big-O running time as a function of n? (Hint: you may want to describe a simple but inefficient recursive algorithm and describe how it may be improved.)

The recurrence leads to a recursive algorithm where we return the value of T(n) directly if n is less than 3 and otherwise return T(n-1) + T(n-2) + T(n-3) calculated by recursion. This is correct but inefficient because of recomputation -- the time taken is about T(n) itself, which is exponential in n.

We can do it in O(n) time, efficiently enough for the problem, by either (a) memoizing the above recursive computation by storing each T(i) in a table and not recomputing it, or (b) computing this same table bottom-up by using a loop with the key statement being `T(i) = T(i-1) + T(i-2) + T(i-3).`

There are even faster ways to compute T(n). We could power the appropriate 3 by 3 integer matrix in O(log n) time, for example.