CMPSCI 291b: Reasoning About Uncertainty

Solutions to Final Exam

David Mix Barrington

Exam given 15 May 2008, solutions posted 20 May 2008

Question 9 solution corrected 14 May 2009.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 10 points
  Q7: 10 points
  Q8: 20 points
  Q9: 30 points

  Total: 120 points

Question text is in black, solutions in blue.

Question 1 (10): True or false with justification: Consider a version of Two-Finger Morra where each player decides to put out one or two fingers, but this time player E wins the product of the two numbers if this product is even and loses the product if the product is odd. Then if both play optimally, E will win 2 points.
TRUE. For E, move 2 dominates move 1 as a strategy, because whatever O does E improves her position by moving 2. Similarly, move 1 dominates move 2 for O, since she does better with 1 given either move by E. So under optimal play, E will put out two fingers, O will put out 1, the product will be 2, and E will win 2 points.
Question 2 (10): True or false with justification: It is estimated that a British airman in World War II had a 5% chance of dying on any given bombing mission. If we assume that this was the exact probability, and that the events of dying on different missions were independent, then an airman was certain to die at some point during his first 20 missions.
FALSE. The airman has an 0.95 probability of surviving one mission, and hence a probability of (0.95)²⁰ = 0.358 of surviving 20 missions in a row. (There's no need to calculate (0.95)²⁰ to answer the question as it is clearly positive. You can estimate it as being close to 1/e, since 0.95 is close to e^-0.05.) I was disappointed that half of you said "true" -- if the airman survives his first 19 missions, you are saying that the Germans are obligated to shoot him down on the 20th, but how do they know which plane he is in, and how does this square with the dangers on different missions being independent?
Question 3 (10): True or false with justification: If a nine-digit Social Security number is uniformly chosen from all possible nine-digit numbers, then the probability that the number has four or more 0's is less than (9 choose 4) times 10^-4.
TRUE. There are (9 choose 4) different sets of four positions. For any of these sets, there is a (10)^-4 chance that all four positions will get values of 0. So the event in question is the union of (9 choose 4) events of probability (10)^-4 each, so by the Union Bound its probability can be no greater than (9 choose 4) times 10^-4. And in fact it must be less than that because the (9 choose 4) events are not disjoint.
Question 4 (10): True or false with justification: If a nine-digit Social Security number is chosen as in Question 3, then the chance of the number having exactly one 0 is exactly (0.1)(0.9)⁸.
FALSE. The probability of the first number being 0 and the rest nonzero, for example, would be (0.1)(0.9)⁸. But the probability of exactly one 0 in the number is nine times this figure, because there are nine possible positions for the single 0. From the Binomial Theorem, the correct probability is (9 choose 1)(0.1)(0.8)⁸.
Question 5 (10): True or false with justification: There are more than 1250 five-card poker hands that contain the ace, 2, and 3 of hearts. (Here we consider the same cards in a different order to be the same hand.)
FALSE. Such a hand contains the three given cards and any two of the other 49 cards in the deck. There are (49 choose 2) = 49*48/(1*2) < 50*50/2 = 1250 ways to pick these two cards. (1176, but you don't need the exact number to answer the question.)
Question 6 (10): True or false with justification: Of the five-card poker hands counted in Question 5, exactly 45 are flushes or straight flushes.
TRUE. To get a flush or straight flush we must select the last two cards from the 10 remaining hearts, and there are (10 choose 2) = 10*9/(1*2) = 45 ways to do this.
Question 7 (10): My experience is that 75% of the dogs in my area are friendly. Friendly dogs growl at me 10% of the time, and unfriendly dogs growl at me 70% of the time. If I meet a previously unknown dog who growls at me, what should be my estimate of the probability that it is friendly? What should be my estimate of the probability that it is friendly if it does not growl at me?
Let F be the event that the new dog is friendly and G be the event that it growls. We are given Pr(F) = 0.75, Pr(G|F) = 0.1, and Pr(G|¬F) = 0.7.
Using the odds-likelihood method, we can calculate O(F) = 0.75/(1-0.75) = 3, L(G|F) = 0.10/0.70 = 1/7, and L(¬G|F) = (1-0.1)/(1-0.7) = 3. Thus O(F|G) = O(F)L(G|F) = 3/7, making Pr(F|G) = (3/7)/(1 + 3/7) = 30%. Similarly O(F|¬G) = O(F)L(¬G|F) = 9, making Pr(F|¬G) = 9/(1+9) = 90%.
Another way to do this is to look at the four possible cases for the two variables. A random dog is a friendly growler with probability 7.5%, a friendly nongrowler with probability 67.5%, an unfriendly growler with probability 17.5%, and an unfriendly nongrowler with probability 7.5%. If the dog is growling, we can rule out the two nongrowling cases, making the probability of friendliness 67.5/(67.5+7.5) = 0.90. Similarly, if we know we are one of the two nongrowling cases the probability of friendliness is 7.5/(7.5+17.5) = 0.30.
Question 8 (20): On a birdwatching trip, I observe a blue parrot. Consulting my bird book, I find that 10% of the blue parrots in this area are Norwegian Blues, and the other 90% are Common Blues. It also says that there are four distinguishing characateristics of the Norwegian Blue (lovely plumage, rests on back, pines for fjords, stuns easily) -- each characteristic is observed 60% of the time in Norwegian Blues but only 20% of the time in Commmon Blues, and the different characteristics are independent of each other.
If I observe exactly three of the four Norwegian Blue characteristics in my parrot, what is my new estimate of the probability that it is a Norwegian Blue?
Let N be the event that a parrot is Norwegian and let F₁, F₂, F₃, and F₄ be the four characteristics. We are told Pr(N) = 0.10, Pr(F_i|N) = 0.60 for each i, and Pr(F_i|¬N) = 0.20 for each i. Thus L(F_i) = 0.6/0.2 = 3 for each i, and L(¬F_i) = (1-0.6)/(1-0.2) = 1/2 for each i.
Originally O(N) = Pr(N)/Pr(¬N) = 0.1/0.9 = 1/9. If, say, F₁, F₂, and F₃ are true and F₄ is false in our evidence e, we compute O(N|e) = O(N)*3*3*3*(1/2) = 3/2, giving a probability of (3/2)/(1 + 3/2) = 60% that the parrot is Norwegian.
If we observed none of the four features our odds would be 1/144 and our probability less than 1%. For one feature our odds would be 1/24 and our probability 4%. For two features we would have 1/4 and 20%, for three features 3/2 and 60% as above, and for all four features odds of 9 and probability of 90%.
Question 9 (30): This question concerns a Markov decision process with three states, called A, B, and C, and two possible actions S (stay) and J (jump). On action S, the state always remains the same on the next step. On action J, it moves to one of the other two states with probability 0.5 each. The reward function is 3 for state A, 2 for state B, and 0 for state C.
- (a,10) What policy (action to be taken in each state) maximizes the expected return on the next move? What is the expected return from each state?
  In A, choosing S gives certainty of reward 3 while J gives expected reward (2+0)/2 = 1. In B, choosing S gives certainty of 2 while J gives expected reward (3+0)/2 = 1.5. In C, choosing S gives certainty of 0 while J gives expected reward (3+2)/2 - 2.5. So the correct policy is to choose S in states A or B and choose J in state C. The expected one-turn reward is 3 from A, 2 from B, and 2.5 from C.
- (b,10) Consider the policy that takes S from state A and takes J from state B or C. What is the matrix of the Markov chain resulting from this policy? What is the long-term behavior of this Markov chain? Justify your answer.
  The matrix looks like this:
```
1.0  0.0  0.0
0.5  0.0  0.5
0.5  0.5  0.0
```
  The Markov chain does reach a steady-state distribution, which is state A with probability 1. Qualitatively, we can see that once we reach A we never leave it, and in each other state we have an 0.5 chance of moving to A, so the chance of our being in A after t turns is at least 1 - 2^-t. Even if we don't observe this, we can solve for a steady-state distribution by setting a vector [p q r] times the matrix above to be [p q r]. Then the first component of this vector equality p = p + q/2 + r/2 tells us that q and r must be 0, since neither can be negative, and thus that p must be 1.
- (c,10) What policy on the first turn maximizes the total expected reward on the first two turns? (There is no discounting here.) (You should assume that on the second turn, you will apply the policy found in part (a).)
  First note that the total expected payoff on the first and second moves depends on the state after the first move -- we have a reward vector of [3 2 0] from the first-move reward, and an expected reward vector of [3 2 2.5] for the second-move reward, as computed in part (a). So we must choose a first move to maximize the expected reward using the single reward vector [3 2 0] + [3 2 2.5] = [6 4 2.5]. From state A, S on the first move has expected reward 6 and J has (4 + 2.5)/2 = 3.25. From state B, S has expected reward 4 and J has (6 + 2.5)/2 = 4.25. From state C, S has expected reward 2.5 and J has (6 + 4)/2 = 5. So the optimal policy for the first of the two moves uses S from state A and J from state B or C. The expected reward vector is [6 4.25 5].

Last modified 14 May 2009