# Solutions to Final Exam

### Directions:

• Answer the problems on the exam pages.
• There are eight problems for 120 total points plus 5 extra credit. Actual scale is A = 105, C = 70.
• If you need extra space use the back of a page.
• No books, notes, calculators, or collaboration.
• The first six questions are true/false, with five points for the correct boolean answer and up to five for a correct justification.
• When the answer to a question is a number, you may give your answer in the form of an expression using arithmetic operations, powers, falling powers, or the factorial function. Probabilities may be given as either fractions or decimals.

```  Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 10 points
Q6: 10 points
Q7: 40+5 points
Q8: 20 points
Total: 120+5 points
```

Question text is in black, solutions in blue.

Correction in green made 18 December 2009.

• Question 1 (10): True or false with justification: Zane's Noodle Bowl has adopted a new policy for customers to choose from the thirteen kinds of vegetables they may put in their soup. A customer gets exactly five helpings of vegetables, and more than one helping may be of the same kind. (So, for example, one choice would be "two helpings of bean sprouts, two helpings of pea pods, and one helping of carrots".) Then of all the ways to choose five helpings of vegetables, over 25% have five different kinds of vegetables.

FALSE. There are ((13 + 5 - 1) choose 5) = (17 choose 5) ways to choose a multiset of five elements from thirteen possibilities. There are (13 choose 5) ways to choose a set of five elements. There are various ways to calculate the ratio, but (13 choose 5) = 1287 and (17 choose 5) = 6188, almost five times 1287 and clearly more than four times 1287.

• Question 2 (10): True or false with justification: Before their game on 17 May 2009, the Toronto Blue Jays had won 23 games and lost 13, and the New York Yankees had won 16 games and lost 17. If we assume that the Blue Jays win each of their games independently with probability b (and lost with probability 1 - b), and that the Yankees win each of their games independently with probability y (and lise with probability 1 - y), then we conclude from these results with confidence of 95% or more that b > 1/2, and we may not conclude with 95% confidence that y < 1/2.

FALSE. The given assertion would be justified, according to what we've learned, if the Blue Jays' number of wins was more than two standard deviations above their expected value if b = 1/2, and the Yankees' number of wins was less than two standard deviations below their expected number if y = 1/2. The latter is actually true, but to prove the given assertion false it suffices to show that the former is false. If b = 1/2, the number of Blue Jays wins would be a binomial random variable with expected value nb = 18 and variance nb(1-b) = 9, and thus a standard deviation of 3. The observed value, 23, is less than two standard deviations away from 18.

• Question 3 (10): True or false with justification: Consider any two-player simultaneous-move game where Players A and B each have a choice of two options and there is a 2 × 2 matrix giving the payoff for A in each of the four possible situations. Then the optimal strategy for each player is a mixed strategy, where each option is taken with some positive probability.

FALSE. Depending on the payoff matrix, one or both players might have a dominant strategy that they should play all the time, and hence the probability of the other strategy in the optimal mixed strategy would be zero, not positive.

For example, suppose the payoff to A for A choosing i and B choosing j were just i + j. Then clearly A does better by choosing the larger of his two numbers, and B does better by choosing the smaller of hers, each with probability 1.

• Question 4 (10): True or false with justification: Suppose I shuffle a standard 52-card deck and deal one card to you and one card to me. I offer you a bet where I will pay you \$16 if the two card have the same rank, and you will pay me \$1 if they are not. Then this bet is "actuarially fair", meaning that its expected value for each of us is 0.

TRUE. There are (52 choose 2) = 51*26 pairs of cards, and there are 13*(4 choose 2) = 13*6 pairs where both cards are of the same rank. We can compute the probability of both cards being of the same rank as 13*6/51*26 = 1/17. (We could also observe that given my card, exactly 3 of the 51 possibilities for your card have the same rank as mine, giving a probability of 3/51 = 1/17.) My expected payoff is thus (1/17)(-16) + (16/17)(1) = 0, and of course your expected payoff is 0 as well because this is a zero-sum game.

• Question 5 (10): True or false with justification: Suppose I throw a single fair six-sided die three times. Let A be the event that the first two throws produce two different numbers, and let D be the event that the each of the three dice has a different number. Then Pr(D | A) is exactly twice as large as Pr(¬D | A).

TRUE. We can compute Pr(A) = 5/6 and Pr(D) = 5/9, the latter because there are 6*6*6 sequences of three numbers and 6*5*4 of these have three different numbers. Pr(D|A) is defined to be Pr(D∩A)/Pr(A), but Pr(D∩A) is just Pr(D) because the event D is a subset of the event A. So Pr(D|A) is (5/9)/(5/6) = 2/3, and clearly Pr(¬D|A) = 1 - Pr(D|A) = 1/3. We could also observe that once the first and second dice are thrown with different numbers, there are four possible throws of the third die that cause D to happen, and two that do not.

• Question 6 (10): True or false with justification: In the example of Question 5, also let B be the event that the first and third throws give two different numbers, and let C be the event that the second and third throws give two different numbers. Then Pr(D) = Pr(A) + Pr(B) + Pr(C) - Pr(A ∩ B) - Pr(A ∩ C) - Pr(B ∩ C).

FALSE. Many of you observed that this expression is an incorrect version of the three-set inclusion/exclusion formula for A ∪ B ∪ C. If in fact D were equal to A ∪ B ∪ C, we could prove the statement false by showing that the missing term, Pr(A ∩ B ∩ C), is nonzero, which it is. But actually D is equal to A ∩ B ∩ C, so we must rule out the possibility that the two "errors" still lead to a true statement. So we have to evaluate Pr(A ∩ B) -- the other two terms are similar. For A ∩ B to be true, we must have the first number differ from both the second and the third, though the second and third may be equal. Once we choose the first number, there is a 5/6 chance the second differs from the first and a 5/6 chance the third differs from the first, and since these events are independent the probability that both happen is (5/6)(5/6) = 25/36.

The alleged equation can now be evaluated using the facts derived here and in the solution to Question 5: 5/9 = 5/6 + 5/6 + 5/6 - 25/36 - 25/36 - 25/36, which is false because the right-hand side evaluates to 5/12, not 5/9. If we added the missing term, we would find that Pr(A ∪ B ∪ C) = 35/36, the chance that the three numbers are not all the same.

• Question 7 (40+5): Donna, a trombonist with the UMass Marching Band, has been assigned extra marching practice. She begins the drill facing north. Her instructor gives her a series of commands, each either L for "left face" or R for "right face". On L, she turns 90 degrees to the left 90% of the time and turns 90 degrees to the right 10% of the time. On R, she turns right 90% of the time and turns left 10% of the time. We are concerned only with the direction she is facing after such a series of commands.

• (a,5) What further assumptions do we need to model Donna's movements as a Markov Decision Process?

The most important assumption is that Donna's actions are independent of each other, and that the probability that she turns left or right depends only on her state and the command, not on any other prior history.

• (b,5) Draw a diagram of this MDP, with a state for each direction Donna might be facing.

The diagram has four states, N, E, S, and W. From N to E there are two edges, one labeled "R,0.9" and one labeled "L,0.1". There are two similar edges from E to S, from S to W, and from W to N. From N to W there are two edges, one labeled "L.0.9" and the other labeled "R,0.1", and there are two similar edges from W to S, from S to E, and from E to N.

• (c,10) For each state, compute the probability that Donna is in that state after the command sequence LLLL.

The key observation is that Donna is facing the correct direction (north in this instance) if she has made an even number of mistakes, and the opposite direction if she has made an odd number of mistakes. This is because each mistake changes her direction by 180 degrees relative to the direction she should be in.

So she is facing north if she has made 0, 2, or 4 mistakes, and since the number of mistakes is a binomial random variable with p = 0.1 and n = 4, the probability of this is (4 choose 0)(0.9)4(0.1)0 + (4 choose 2)(0.9)2(0.1)2 + (4 choose 4)(0.9)0(0.1)4 = 0.6561 + 0.0486 + 0.0001 = 0.7048. She cannot be facing east or west, so she is facing south with probability 1 - 0.7048 = 0.2952.

• (d,10) Estimate the probability that Donna is in each state after 100 commands, 50 of them L and 50 of them R. Jutify your answer. Would the distribution by different for a different sequence of 100 commands?

We know that after an even number of commands, she must be facing north or south. It seems reasonable that after a large number of errors, she would be about equally likely to have made an odd or an even number of errors, so we would expect her to be facing north or south with equal probability. But this is not a proof -- we need to show that (1/2, 0, 1/2, 0) is an attracting steady state, not for the one-command chain, but for the four Markov chains corresponding to the pairs of commands LL, LR, RL, and RR. Either of the single commands take the distribution (1/2, 0, 1/2, 0) to (0, 1/2, 0, 1/2), and either command takes this second distribution back to (1/2, 0, 1/2, 0). Among distributions of the form (p, 0, 1-p, 0), the pairs of commands bring the distribution closer to the steady state.

If we had, say, 51 L's and 49 R's, the "correct" direction would be south, but for similar reasons Donna would be about equally likely to be in the correct direction (south) or 180 degrees from the correct direction (north).

• (e,10) Assume now that after a day of this practice, Donna's performance has improved so that the next day she turns correctly for an L command 95% of the time and for an R command 98% of the time. (She still turns in the opposite direction whenever she does not turn correctly.) For this new MDP, define a reward function for the instructor, so that he receives one point if Donna is facing in the direction she should be facing if she had responded correctly to all the commands she has been given, and no points if she is facing in some other direction. What policy should the instructor use to maximize his long-term expected reward? Justify your answer, but note that you need not (yet) calculate his expected reward for this policy.

We can simplify the MDP by only considering whether Donna is in the correct direction or 180 degrees opposite. In this two-state MDP, on L she stays in the same state with probability 0.95 and switches with probability 0.05, and on R she stays with probability 0.98 and switches with probability 0.02. The optimal policy for the instructor, who wants her in the "correct" state, is to give command R when she is facing correctly, maximizing the chance that she stays correct, and L when she is facing incorrectly, maximizing the chance that she switches.

• (f,5XC) Calculate the average reward per turn to the instructor in the steady state of the Markov chain arising from the optimal policy you found in part (e).

In the steady state, Donna is correct with some probability p and incorrect with probability 1-p. We can compute that p = (0.98)(p) + (0.05)(1-p), from which follows 0.07p = 0.05, or p = 5/7. In the steady state, then, the expected reward is (5/7)(1) + (2/7)(0) = 5/7.

• Question 8 (20): The police believe that a wanted suspect may be in a particular building, and they station detectives in four different locations to watch for him. They estimate that if the suspect is there (event S), each detective has a 40% chance of seeing movement inside the building. If the suspect is not there (event ¬S), each has a 10% chance of seeing movement. We first assume that the reports of the four detectives are conditionally independent with respect to S.

• (a,5) If R is the event that a particular single detective reports movement, calculate the likelihood ratios L(R | S) and L(¬R | S).

L(R|S) = Pr(R|S)/Pr(R|¬S) = 0.4/0.1 = 4.

L(¬R|S) = Pr(¬R|S)/Pr(¬R|¬S) = 0.6/0.9 = 2/3.

• (b,5) If one detective reports movement and the other three do not, does this make it more or less likely that the suspect is there?

With one positive and three negative reports, we multiply the prior odds by 4*(2/3)*(2/3)*(2/3) = 32/27. Since this is greater than one, multiplying it will increase our estimate of the probability that S is true.

• (c,5) Suppose that the initial estimate of the probability of the suspect's presence is 1%, and that the police only want to enter the building if it is more likely than not that he is there. How many of the four reports must be positive before this is true?

The prior odds for a probability of 0.01 are (0.01)/(1 - 0.01) = 1/99. With four positive reports, our posterior odds are (1/99)*4*4*4*4 = 256/99, which is greater than one, so we believe it is more likely than not that S is true. With three positive and one negative report, our posterior odds are (1/99)*4*4*4*(2/3) = 128/297, which is less than one. So four positive reports are necessary and sufficient to justify entering the building.

• (d,5) Explain what it means for the reports of the four detectives to be conditionally independent. Give an example of a situation where the probabilities given are correct, but the conditional independence assumption is not valid.

Formally, conditional independence of the four events R1, R2, R3, and R4 means that for any i and j with i ≠ j, Pr(Ri ∩ Rj|S) = Pr(Ri|S) * Pr(Rj|S) and Pr(Ri ∩ Rj|¬S) = Pr(Ri|¬S) * Pr(Rj|¬S). This means that once we know whether S is true, the four events are independent of each other. The probability of one detective seeing movement is the same whatever the results of the other reports, depending only on whether the suspect is there.

Conditional independence would fail if a movement visible to one detective were more or less likely to be visible to a particular second detective than any other movement. For an extreme case, suppose that the suspect's movement was visible to all four detectives with probability 0.4, and to none of them with probability 0.6. This would meet the conditions of the problem except for the conditional independence.