# Solutions to First Midterm Exam

#### Solutions posted 9 October 2006

• There are seven problems for 100 total points. Actual scale was A = 85, C = 55.
• The exam had a time limit of 120 minutes, though it was not intended that you would need all the time.
• Questions 1, 2, and 3 are "true/false with explanation" -- you get five points for a correct boolean answer, and up to five additional points for a convincing justification.
• Question text is in black, solutions in blue.

```  Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 20 points
Q5: 15 points
Q6: 15 points
Q7: 20 points
Total: 100 points
```

• Questions 1 and 2 concern the weighted interval scheduling problem. Here the input is a set of n jobs, each with a start time , finish time, and reward. Our task is to schedule some of these jobs on a single machine. The output is a non-overlapping subset of the jobs, and the goal is to maximize the total reward from the jobs in the set.

• Question 1 (10): True or false with justification: The following greedy algorithm always gets the maximum possible reward: Sort the jobs by reward and schedule them one by one starting with the highest reward, rejecting any that overlap with jobs already scheduled.

FALSE. It is quite possible for the single highest-reward job to overlap two or more other jobs that could otherwise be scheduled together for more total reward. A simple example is (start 0, finish 2, reward 2), (start 1, finish 4, reward 3), (start 3, finish 5, reward 2). This greedy algorithm schedules only the middle job and scores 3, while you could schedule the other two jobs and get 4. The counterexample also shows that this greedy algorithm can fail even in the special case of Question 2, since rewards are proportional to length.

• Question 2 (10): True or false with justification: Now assume that the reward of each job is proportional to its length. The following greedy algorithm always maximizes the total reward: First schedule the job that starts earliest, then keep scheduling the earliest-starting job that does not overlap with jobs already scheduled.

FALSE. The first job could overlap with a long, valuable job. A simple example has only two jobs: (start 0, finish 2, reward 2) and (start 1, finish 10, reward 9). The greedy algorithm takes the first job for 2 while we could schedule the second job alone and get 9.

• Question 3 (15): True or false with justification: Suppose that in the Huffman algorithm to create an optimal prefix code, there is one letter of the alphabet that has a frequency probability of 0.5 or greater. Then that letter will definitely be assigned a one-bit code word.

TRUE. The Huffman algorithm repeatedly combines the two lowest-frequency letters in the alphabet into a single letter, adding the two frequencies to get a new frequency. As long as there are three or more letters in the alphabet, our high-frequency letter must be the most frequent and thus cannot be one of the two lowest. So it is not combined until there are only two letters left, which means that it definitely goes on level one of the tree (the first level below the root) and gets a one-bit code.

• Question 4 (20): Let G = (V,E) be a connected, undirected graph where each edge has a positive weight and let X be a non-empty proper subset of V (a set of vertices that is not equal to either ∅ or to V). Consider the set Z of edges that have one endpoint in X and one not in X, and let e be an edge in Z that has smaller weight than any other edge in Z. What does the Cut Property say about e and minimum spanning trees for G? Prove that this Cut Property is true.

The Cut Property says that this edge e must be contained in any minimum spanning tree for G. (There could easily be multiple MST's, for example if there are multiple equally costly ways to connect up X and V-X.)

Let u be the endpoint of e in X and let v be the endpoint of e in V-X. Let T be any spanning tree that does not contain e. The graph T ∪ {e} has exactly one cycle, formed by e and the unique path from u to v that exists in T. This path in T must include at least one edge e' = (u', v') that has one endpoint in X and one in V-X. Let U = T - {e'} ∪ {e}. U has a smaller total weight than T because e' has weight larger than e (by the assumption on e). U is a spanning tree -- it has n-1 edges and we can show that it is connected. Any vertex in G has a path to either u' or v' in T - {e'}, and the part of the cycle (of T ∪ {e}) without e' forms a path in T ∪ {e} - {e'} from u' to v'. So there is a path in U from any vertex to any other vertex.

Some students didn't remember what the Cut Property was -- it did form the major part of one lecture so I think it was fair game. I tried to give you enough of the setting so that you could remember the property if you were familiar with any of the MST arguments in the book.

• Question 5 (15): Let G = (V,E) be a connected undirected graph and let v be a vertex in G. Let T by the depth-first search tree of G starting from v, and let U be the breadth-first search tree of G starting from v. Prove that the depth of T is at least as great as the depth of U.

Let the depth of U be d and let w by a vertex on level d of U. We know that the BFS tree from v indicates the shortest-path distance from v to every node (counting each edge as distance 1). Thus there is no path in G of length less than d from v to w. If the depth of T were less than d, there would be a path in G of length less than d from v to w, given by the path in T. This is impossible, so T cannot have depth less than d.

Many people correctly noted (from the homework) that if G is itself a tree, than T and U are the same tree and thus have the same depth. They then went on to the case where G has a cycle, as in another homework problem. If there is a cycle, and x is the first node in the cycle encountered (by both searches), then one of the neighbors of x in the cycle will have depth one greater than that of x in U but a larger depth in T. However, this does not in itself show anything about the depth of the trees T and U, because this depends on the deepest node in each tree and these neighbors of x might not be those nodes.

• Question 6 (15): Recall that an independent set of an undirected graph G is a set X of vertices such that no edge of G has both endpoints in X. Describe an algorithm that takes G and a number k as input, and finds an independent set of size k in G if one exists. Let n be the number of vertices in G. Analyze the big-O running time of your algorithm as a function of both n and k -- for full credit your algorithm should take O(nkk2) time.

This problem turned out to be harder for you than I intended -- the right way to go about it was a brute-force algorithm, which we have not emphasized in the class so far. Neither BFS nor DFS is very helpful. Many people gave incorrect algorithms, that might find an independent set but could easily fail to find one that was there. I gave significant partial credit for these if there was a correct timing analysis. (I was disappointed with the large number of people who made no attempt to time their algorithm.)

The simple brute-force algorithm I had in mind was to consider each of the (n choose k) subsets of V with size k, and check each one to see whether it is an independent set. The latter test involves looking at the (k choose 2) possible pairs of nodes in the set, and seeing whether any of them has an edge. This takes O(k2) time, and since (n choose 2) = Θ(nk), the total time is Θ(nkk2).

Some people found a a slightly better algorithm, placing nodes on a stack as they were found to form an independent set with the nodes already on the stack. This algorithm spends at most O(n) time trying to increase the stack from size 0 to 1, at most O(2n2) trying to increase it from 1 to 2, and so on, until the dominant term has O(knk) time trying to increase it from k-1 to k. (To try to add a new node, you must check the k-1 nodes on the stack to see if any has an edge to the new node.) Since this is also O(nkk2), it is acceptable.

Many people put the nodes in a priority queue sorted by degree and either added low-degree nodes to the proposed independent set, eliminating nodes with edges to the added nodes, or deleted high-degree nodes in the hope of winding up with k nodes of degree 0. Both of these methods can fail to find an independent set that exists -- I can show you examples. The point is that you had no justification to say that this algorithm would always work, so no real reason to think it was correct. You still got a fair number of points if you described such an algorithm clearly and timed it correctly.

• Question 7 (20): Here we have a scheduling problem slightly different from the others we have seen. Each job has a length and a preferred finish time, and we must schedule all the jobs on a single machine without overlapping. (There is a single start time at which all the jobs become available at once.) If we complete a job before the preferred time, we get a reward equal to the time we have saved. If we complete it after the preferred time, we pay a penalty equal to the amount of time that we are late. Our total reward (which we want to maximize) is the sum of all rewards for early completion minus the sum of all penalties for late completion.

• (a,5) Argue carefully that there is an optimal schedule that has no idle time (actually, we want to argue that every optimal schedule has no idle time). That is, every optimal schedule always has one job start at the same time the previous job finishes. (Hint: Show how to change any schedule with idle time to another schedule that has no idle time and gets at least more reward.)

Consider any schedule S with a block of idle time of length x in front of some job J. Make a new schedule S' by switching J and the block of idle time, leaving all the other jobs in the same place. The only reward that changes is that of J, which increases by x, so S cannot have been optimal.

If we continue making such swaps with every block of idle time that is in front of a job, we keep increasing the reward and we eventually reach a schedule where there is no idle time because all those blocks occur after all those jobs. This is then a schedule T with no idle time that is better than S. Many of you constructed this T directly from S, by sliding all jobs after each idle time block forward by the length of the block. One thing I insisted on if you did this was that you consider the effect on all the jobs -- in fact you increase the net reward for every job you move forward.

• (b,15) Consider the greedy algorithm where we sort the jobs by length and schedule them in that order, shortest job first, with no idle time. Prove that this achieves a net reward at least as great as that of any schedule. (Note: In lecture we proved that a different greedy algorithm is optimal for a different goal, that of minimizing the maximum lateness. You cannot simply quote that result because it does not apply to this algorithm or to these rewards and penalties. But a similar exchange argument will work in this new case.

As in lecture, we consider any schedule that has no idle time (using 7a) and is not ordered by length, and consider one of its inversions where job J is just before job I but is longer than I. We improve this schedule by switching I and J, leaving all the other jobs exactly where they are. We increase the reward of I by the length of J, and decrease the reward of J by the length of I. Since J is longer than I, this is a positive net gain.

Repeating this process leads us to a schedule that is sorted by length. Only in such a schedule can we not improve the reward by swapping to remove an inversion.

Note that with this reward system, the preferred finish times have no effect on the schedule, because moving a preferred finish time of a job adds or subtracts the same amount to the net reward of any schedule.