# Solutions to Final Exam, Fall 2006

#### 11 January 2006

Question text is in black, solutions in blue.

```  Q1: 15 points
Q2: 15 points
Q3: 10 points
Q4: 20 points
Q5: 10 points
Q6: 20 points
Q7: 10 points
Q8: 20 points
Total: 120 points
```

• Questions 1 and 3 deal with the following scheduling problem. We have n jobs J1,..., Jn, each with a duration di and a reward ri, and a single machine which will be available for some amount of time t. We need to schedule the jobs in some order without knowing t, in order to maximize the amount of total reward from the jobs we complete. If our time runs out while we are running a job Ji, we get a proportional partial reward for the time x we spent on it -- this partial reward is xri/di.

• Questions 4, 6, and 8 deal with variants of the BIN-PACKING problem. It may be helpful for you to have the definition of the original BIN-PACKING problem: The input is a set of n items, each with a positive size, and a positive bin size b. The output of the optimization version of the problem is the minimum number of size-b bins needed to store the n items. The decision version of the problem takes an additional positive integer k as input, and the output is a boolean telling whether the items can be packed into k bins of size b. You may assume that this decision problem is NP-complete.

• Question 1 (15):

Give an algorithm, with running time polynomial in n, that determines an order of the jobs that will achieve the optimal reward for all values of t. (You do not need to prove here that your order has this property, that is Question 2.) State and justify a polynomial bound on the running time.

For each job Ji, compute zi = ri/di, the rate at which we accumulate reward while doing job Ji. Then order the jobs by decreasing zi, breaking ties arbitrarily.

The computation of the rates takes O(n) time, and sorting the jobs by rate takes O(n log n) time if we use Mergesort, for example. Total time to find a correct order is thus O(n) + O(n log n) = O(n log n).

• Question 2 (15):

Prove that no other order on the jobs can achieve a greater value than your order from Question 1, for any possible value of t. (Hint: You can show that any other order either achieves the same value as yours does, or can be altered to get a greater value for some t.)

We first claim that any two orders that are nonincreasing by rate score the same reward for any t. We prove this by an exchange argument -- we can get from any one such order to any other by a sequence of swaps of adjacent jobs with the same rate. Let the jobs be ordered J1,..., Jn by nonincreasing rate, assume that Ji and Ji+1 have the same rate z, and consider the order J1,..., Ji-1, Ji+1, Ji, Ji+2,..., Jn obtained by swapping Ji and Ji+1. We need to prove that for any t, these two orders achieve the same reward. (Then the general result will follow by induction on these single exchanges.)

If t is less than the sum of the first i-1 durations, or greater than the sum of the first i+1 durations, then the two orders complete exactly the same jobs and finish by devoting the same amount of time to the same final job. Thus in these cases they achieve the same reward. Now consider some t between these two sums, so that t = (sum of first i-1 durations) + y. Both orders complete the first i-1 jobs, and each then spends a time of y on jobs Ji and/or Ji+1, which each have rate z. So each order achieves (sum of first i-1 rewards) + yz, and thus both achieve the same reward.

Now we must prove that any order that is not nonincreasing by rate fails to achieve the optimal reward for some value of t. Again we use an exchange argument. Suppose that the order J1,..., Jn fails to be nonincreasing and let Ji and Ji+1 be the first pair of adjacent jobs such that zi < zi+1. We will show that swapping Ji and Ji+1 gets a smaller reward for some t. Let t be the sum of the first i-1 durations, plus the smaller of the two numbers di and di+1, which we may call d. The original order gets a reward of (sum of first i-1 rewards) + dzi, while the other order gets (sum of first i-1 rewards) + dzi+1, which is strictly greater by our hypothesis.

As above, we can check that swapping two adjacent items that are out of order by rate has no effect on the total reward for t's that are outside the range from (sum of first i-1 durations) to (sum of first i+1 durations), and increases the reward for t's within that range. Thus if we take any order and perform swaps of adjacent jobs until we get an order that is nonincreasing by rate, we never hurt the reward and we sometimes help it, so our final order is optimal for all values of t.

• Question 3 (10):

True or false with justification: Let A be an algorithm that operates on a list of size n as follows. If n ≤ 1, then A takes only O(1) time. Otherwise A spends O(n) time to split the list into two pieces, each of which has size at most 2n/3. It then calls itself recursively on the two pieces. Then any such A has a running time that is polynomial in n.

TRUE. The recurrence is T(n) ≤ 2T(2n/3) + O(n), with base case T(O(1)) = O(1), and we saw in lecture that this has a solution of T(n) = O(nlog3/22), which is polynomial.

Many of you wrote that the solution was O(n log n), which is true if we interpret the word "split" in the problem statement to mean that the sizes of the two pieces add to n. If we assume that T(m) ≤ c(m log m) for m smaller than n, and then compute T(n) ≤ T(s) + T(n-s) + dn where s is between n/3 and 2n/3, we get the following. T(n) is at most c(s log s) + c(n-s)(log(n-s)) + dn ≤ cn(log(2n/3) + dn = cn(log n) - cn(log 3/2) + dn, which is less than cn(log n) for the appropriate choice of c in terms of d.

• Question 4 (20):

Consider the special case of the BIN-PACKING problem where the size of each item is a positive integer and the bin size is a constant positive integer, called b. Given n items, we need to find the exact minimum number of bins needed to store them all. (The problem is no longer NP-complete in this special case, unless P = NP.) Describe an algorithm to solve this problem that has a running time that is polynomial in b if b is considered to be a constant. State and justify a polynomial bound on your running time.

My intended solution here was to use dynamic programming, in the same way that we solved the subset-sum problem in lecture when the size of each object was an integer polynomial in n. We can represent any assignment of objects to bins by a vector (n1,...,nb) where ni is the number of bins that have exactly i units in them. There are at most nb of these vectors.

Let's make a table with a boolean for each one of these vectors. Originally we have (0,...,0) true and all the other entries false. This reperesents all the possible arrangements of the first 0 items. Assuming our table represents all the arrangements of the first i items, we scan the table and determine all the ways we can make a valid arrangement by adding the i+1'st item and make a new table containing true entries for all those vectors. For example, if b=3 and we are considering the arrangment (4,7,2) with a new item of size 2, we could get a new arrangement (4,8,2) by putting the new item in a new bin, or (3,7,3) by adding the new item to a bin with one unit already in it.

After we have processed all n items, we have a list of all the valid arrangements of n items and we can check whether any uses k or fewer bins. This takes n passes through the O(nb) size table, for total time O(nb+1), a polynomial as long as b is constant.

Most of you took a greedy approach to the problem, filling the first bin as full as possible, then the next bin as full as possible from the remaining items, and so forth. You got significant partial credit for presenting and analyzing such an algorithm, but these algorithms are not correct in general. There may be many ways to fill the next bin using the remaining items, and some ways may be better than others with regard to leaving a set of items that will fit into a particular number of remaining bins. For example, let b = 3 and let our item sizes be 2, 2, 2, 1, 1, 1. We can fit these items into three bins but if we fill the first bin greedily with the three 1's we wind up using four bins. A natural heuristic will solve this example but we can easily make more complicated ones. (If your algorithm never makes use of the fact that b = O(1), for example, then it is either incorrect or proves that P = NP by solving the general case where b is polynomial in n -- we asserted in class that BIN-PACKING is strongly NP-complete.)

• Question 5 (10):

True or false with justification: The SQUARE-TILING problem has as input a set of n2 tiles, each of which has one of n2 possible colors. (It is possible that not all the colors are used.) The output is a boolean saying whether the tiles can be put in an n by n square such that each tile is used exacctly once and such that no color appears more than once in any row or in any column. Then there is a polynomial-time reduction from SQUARE-TILING to HAMILTON-CIRCUIT.

TRUE. Any decision problem in NP reduces to HAMILTON-CIRCUIT, because HAMILTON-CIRCUIT is NP-complete. It is easy to show that SQUARE-TILING is in the class NP, because given an alleged valid tiling we have only to show that each tile is used exacctly once, that no color appears twice in the same row, and that no color appears twice in the same column. Thus the problem does not require us to determine whether SQUARE-TILING is in P, is NP-complete, or neither.

As it turns out, SQUARE-TILING is in P. If there are n+1 or more tiles with the same color, then by the Pigeonhole Principle there cannot possibly be a valid tiling. If there are no more than n of any given color, then we can always do it. Place the tiles in any order where tiles of each color occur consecutively, and colors with n tiles come first. Then place the tiles in the square starting with location (0,0), then (1,1),..., (n-1,n-1), (1,0), (2,1),..., (n-1,n-2), (0,n-1), (2,0), (3,1),..., (1,n-1), and so forth. The colors with n tiles, if any, are put on complete diagonals. Any other set of n-1 consecutive entries in this ordering occur in n-1 different rows and n-1 different columns.

• Question 6 (20):

The RECTANGLE problem is the following variation of BIN-PACKING. We are given a set of n jobs, each with a positive integer size. Let S be the sum of the n sizes. The problem is to determine whether there exist integers b and k such that b > 1, k > 1, bk = S, and the items can be divided into k sets each of which has total size b. (The name RECTANGLE comes from thinking of each job of size si as a 1 by si rectangle, and asking whether these rectangles can be packed into any single rectangle without wasted space, other than the trivial 1 by S solution.) Prove that the RECTANGLE problem is NP-complete.

We first show that the RECTANGLE problem is in the class NP. If we are given a division of the jobs into sets of equal size, we only have to verify that each job appears in exactly one set and that the sets all have the same size (and that b and k are each greater than 1). This takes O(n) time, and there exists an arrangement that we will verify if and only if the answer to the RECTANGLE question is "yes".

We can prove RECTANGLE to be NP-complete by reducing SUBSET-SUM to it, though there are complications. If we are given a SUBSET-SUM instance of n items of total size s and a target t, the natural solution is to make a job for each item and add two new jobs so that in order to divide the set of jobs in half we must put original jobs adding to t with one new job and original jobs adding to s in the other.

The complication, though, is that we must make sure there is no other way to divide the new set of jobs into more than two equal pieces. For example, if we just did the simplest thing and made our two new jobs have size t and s-t, it might be possible to divide the jobs into three groups each of size 2s/3. This would mean that we convert an insoluble SUBSET-SUM instance into a soluble RECTANGLE instance, making our reduction invalid.

However, we can fix this easily by making the new jobs so large that no division into more than two sets is possible. For example, if they are size 2s+t and 3s-t, the total size is 6s and we can't possibly have three or more sets because the new items are too big to fit into a bin of size 2s. Now we can fit the jobs into two sets if and only if we put the 3s-t job with original jobs of size t and the other new job with original jobs of size s-t.

It is natural to try to reduce the given general BIN-PACKING problem to RECTANGLE, and this can be done similarly to the reduction above with the same complication to address. Given n items of total size s, a bin size b, and a target number of bins k, we make a job for each item and then make k new jobs of size z (where z is to be determined) and bk - s new jobs of size 1. If we can put the original items into k bins, then we can add the size-1 jobs to get k sets with exactly b units each in them, then add a new item of size z to each bin to get k bins with exactly b+z units in each. Now we want to pick z so that there is no other way to divide the jobs equally into a number of sets other than k. If we pick z > bk, we have z > k(z+b)/(k+1) and the new jobs of size z are too big to fit into the k+1 of more sets.

But there's one more problem -- there could be a way to divide the jobs into fewer than k bins by putting more than one size-z job in the same bin. This can't happen if k is a prime number, so we can adjust the jobs to fix this by picking a prime number p with p ≥ k, and adding p-k new jobs of size z+b. Then any division of the p jobs of size z or z+b into fewer than k bins would leave a gap too large to be filled with the original jobs (which total to less than z).

Of course the reduction from SUBSET-SUM or NUMBER-PARTITION works fine, so we don't need this prime-number trick to solve the problem. I just wanted to show that the reduction from general BIN-PACKING is possible.

• Question 7 (10):

True or false with justification: Let R be a resource to which I want access, and suppose that in any round of a protocol I have at least a 1/n chance of getting R. Further suppose that the events of success in each round are independent. Then my chance of succeeding at least once in the first n rounds is at least 1/2.

TRUE. The chance that I fail n times in a row is at most ((n-1)/n)n because we take the product of the probabilities of the n different failure events and each of those is at most (1 - 1/n). We saw in lecture that this number ((n-1)/n) is less than 1/e for any positive n, and since 1/e < 1/2 the success probability is more than 1/2.

A disappointing number of you got to the correct answer of "TRUE" by a completely bogus argument. You said that since you have a 1/n chance of success in each of the n attemps, your total chance of success is n(1/n) = 1. You may have been thinking of the Union Bound, which tells us that this total probability is at most 1. This is true but doesn't tell help us for this problem since what we care about is whether it is at least 1/2.

You probably don't really believe that if you flip a fair coin once, you are guaranteed to get heads at least once. (If you do believe this, stay away from Foxwoods.) Note that this is exactly the n=2 case of the reasoning that most of you used on this problem -- the two trials each have success probability 1/2 but the chance of at least one success is 3/4, not 1.

• Question 8 (20):

Now consider the variant of BIN-PACKING where the bin size is b (not necessarily an integer) and each item has size less than b/3. This version is still NP-complete, though we won't prove this here. Your problem is to give an algorithm that approximates the optimal packing into bins, and prove a bound on the quality of your approximation. In particular, prove that if your algorithm uses 3a+1 bins for some integer a, then the optimal algorithm uses at least 2a+1 bins.

As in Discussion #11, we can use the simple algorithm where we keep one bin open at a time, look at each item in turn, and put it into the current bin if and only if it fits. If it doesn't fit, we open a new bin. Every bin except the last one we use must be filled to more than 2b/3, since we could only close it if a new item, of size less than b/3, failed to fit.

Thus if our algorithm uses 3a+1 bins, the first 3a bins contain a total size of more than 3a(2b/3) = 2ab. The optimal algorithm could not possibly fit this much size into 2a bins of size b, so it must use at least 2a+1 bins as desired.