CMPSCI 240: Reasoning About Uncertainty

Solutions to Third Midterm Exam

David Mix Barrington

18 November 2009

Question text is in black, solutions in blue.

Directions:

Answer the problems on the exam pages.
There are six problems for 100 total points. Actual scale was A=90, C=60.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first four questions are true/false, with five points for the correct boolean answer and up to five for a correct justification.
When the answer to a question is a number, you may give your answer in the form of an expression using arithmetic operations, powers, falling powers, or the factorial function. If you give your answer using the "choose" notation, also give it using only operations on this list. In addition, if your answer is a non-negative integer less than or equal to 100, you must compute the number for full credit.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 30 points
  Q6: 30 points
  Total: 100 points

The first two true-false questions involve the following method of generating random bits. I take a standard 52-card deck, with 26 red cards and 26 black cards, shuffle it so that every ordering of the cards is equally likely, then deal five cards in order without replacement. My first random bit is 0 if the first card is red and 1 if it is black, the second bit is 0 if the second card is red and 1 if it is black, and the third, fourth, and fifth bits depend in the same way on the third, fourth, and fifth cards.
Question 1 (10): True or false with justification: Each of these five random bits has probability 0.5 of equaling 0.
TRUE. For each of the first five positions, each card has an equal chance to be in that position after the shuffling, and so there is a 26/52 = 1/2 chance that the card in that position is red. It is true that the conditional probability of the second card being red is 25/51 if the first card is red and 26/51 if it is black. But the total probability of the second card being red, for example, is Pr(C1 = red)*Pr(C2 = red | C1 = red) + Pr(C1 = black)*Pr(C2 = red | C1 = black) = (1/2)(25/51) + (1/2)(26/51) = 1/2.
Question 2 (10): True or false with justification: The probability that all five bits are 0 is strictly less than 1/32.
TRUE. Here the probability can be determined as a product of conditional probabilities, Pr(0) * Pr(00 | 0) * Pr(000 | 00) * Pr(0000 | 000) * Pr(00000 | 0000) = (26/52)*(25/51)*(24/50)*(23/49)*(22/48). Since the first factor in this product is equal to 1/2 and the other four are each less than 1/2, the product is strictly less than (1/2)⁵ = 1/32.
Question 3 (10): True or false with justification: Suppose I choose a four-digit decimal number at random, with every number from 0000 through 9999 being equally likely. I then square this number and look at the low-order digit (the "units digit", which is the one furthest to the right). Then this digit is equally likely to be any of the digits from 0 through 9.
FALSE. The low-order digit of the square depends only on the low-order digit of the original number, which is equally likely to be any of the digits from 0 through 9. The low-order digit of the square is thus equally likely to be the low-order digit of 0², 1², 2²,..., 9², and thus has a 20% chance of being each of 1, 4, 6, or 9, a 10% chance of being 0 or 5, and a 0% chance of being 2, 3, 7, or 8.
Question 4 (10): True or false with justification: Suppose we train the Naive Bayes Classifier of Programming Project #3 (the original one, with single-letter features) on any 10 American and any 10 Russian cities. Then if we ask it to classify those same 20 cities as test data, it will classify all of them correctly.
FALSE. There are several ways to see this. The simplest is to note that the same city name, such as St. Petersburg (FL) or Moscow (ID), might be on both lists of training data. The NBC would give the same answer in both cases, and one of these answers would be wrong. (The same phenomenom would occur if some American city and some Russian city contained the same set of letters.)
More generally, the NBC only looks at the training instances in the aggregate, not individually. If a particular letter occurs in exactly one American city and all ten Russian cities, the NBC will take the evidence of that letter as a strong indication for that American city being Russian, and this could easily overwhelm any other evidence and lead to a wrong answer.
Question 5 (30): A fingerprint analyst has been asked whether any fingerprints from a crime scene match those of a particular suspect. She identifies six features of the suspect's prints, and determines that (a) a print from the suspect will have each of these features with probability 70%, independently of each other, and (b) a print from a randomly chosen person other than the suspect will have each of the features with probability 10%, independently of each other.
- (a,10) The first print has all six of the features. Without the fingerprint evidence, the police estimate a 1% chance that this print belongs to the suspect. They ask the analyst whether she can now conclude, based on the six features, that there is a greater than 99% chance that it belongs to the suspect. Given the independence assumptions, can she do so? Justify your answer.
  Let S be the event that the sample print comes from the suspect. We have that for each of these features F_i, Pr(F_i | S) = 0.7 and Pr(¬F_i | S) = 0.1, so L(F_i | S) = 7. We take the original estimate Pr(S) = 0.01, compute O(S) = 1/99, multiply these odds by the six likelihoods to get O(S|e) = 7⁶/99 = 343²/99. Since this is greater than 99, the posterior probability is greater than 0.99 and the analyst can make the requested conclusion.
- (b,10) It is clear that if a print has all six of the features, this makes it more likely to be from the suspect, and if it has none, this makes it less likely. How many features must she observe on a given print before the fingerprint evidence makes it more likely than it was before the fingerprint evidence was considered? (She looks for all six, so a print that has four of them also does not have the other two, for example.)
  We multiply the prior odds by 7 for each feature than occurs, and by L(¬F_i|S) = Pr(¬F_i|S)/Pr(¬F_i|¬S) = 0.3/0/9 = 1/3 for each of the six features that does not occur.
  If none of the features occur, we multiply by (1/3)⁶ < 1.
  If exactly one feature occurs, we multiply by 7*(1/3)⁵ = 7/243 < 1.
  If exactly two features occur, we multiply by 7²*(1/3)⁴ = 49/81 < 1.
  If exactly three features occur, we multiply by 7³*(1/3)³ = 343/27 > 1.
  So three or more of the six features are sufficient to increase the odds of S.
- (c,10) Now suppose that the analyst chose her six features from a set of 50 possible features, and that her sample print from the suspect did not have any of the other 44. In the analysis above, she did not take account of whether the crime scene prints had any of those other 44 features. Describe how she could do so, assuming that these features are independent of the first six and of each other. What information about these features would she need?
  She would need to multiply the odds computed by her first NBC by the likelihood ratios for the other 44 features, by L(F_i|S) for every F_i that occurs in the sample print, and by L(¬F_i) for every F_i that does not occur. To compute these likelihood ratios, we need, for each new feature F_i, an estimate of the probability Pr(F_i|S) that a print from the suspect has this feature, and of the probability Pr(F_i|¬S) that a random person's print has this feature.
Question 6 (30): In last Sunday's football game, the Patriots led the Colts 34-28 with just over two minutes to go and had the ball on their own 28-yard line with fourth down and two to go. (If you are not familiar with American football, the following description of the situation should still be clear -- make sure to ask questions during the exam if it is not.) Patriot coach Bill Belichick had two choices -- he could punt or go for a first down. If he punted, the Colts would get the ball in their own territory and have some probability p of scoring a touchdown and winning the game. (Because the Patriots led by six, if the Colts did not score a touchdown then the Patriots would win. Throughout, we are ignoring a number of low-probability events that could have happened.) If he went for a first down, this would succeed with some probability q, If it succeeded, the Patriots would definitely win the game. If it failed, the Colts would get the ball in Patriots territory and have a larger probability r of then scoring a touchdown and winning the game. In the actual game, Belichick chose to go for a first down and failed -- the Colts then scored and won the game 35-34.
- (a,10) Draw an event tree for the possible outcomes if he punted, and another event tree for the possible outcomes if he went for the first down, In each case, calculate the probability that the Patriots win the game, as a function of the probabilities p, q, and r.
  The event tree for the punt has a root node labeled "Colts score?", with a left child labeled "yes, Colts win" with probability p and a right child labeled "no, Patriots win" with probability 1-p. The probability that the Patriots win is the probability that we reach the leaf in which they win, which is 1-p.
  The event tree for going for the first down has a root labeled "First down?". Its left child is a leaf, labeled "yes, Patriots win", with probability q. The right child of the root is labeled "no, Colts score?" and has two children of its own. The probability of the right child of the root is 1-q. The left child of the right child is a leaf labeled "yes, Colts win" and has probability r if the right child of the root is reached, or total probability (1-q)r. The right child of the right child is labeled "no, Patriots win" and has probability 1-r if the right child of the root is reached, or total probability (1-q)(1-r). The total probability that the Patriots win is the sum of the total probabilities for the leaves where they do so, or q + (1-q)(1-p).
- (b,10) One estimate for the three probabilities, based on historical statistics, is p = 0.3, q = 0.6, and r = 0.5. If these were the correct probabilities, was Belichick's decision the correct one to maximize the probability that his team would win? Justify your answer.
  Yes, with those assumptions going for the first down was correct. The probability of winning given a punt was 1-p = 0.7, while the probability given the first down attempt was q + (1-q)(1-r) = 0.6 + (0.4)(0.5) = 0.8.
- (c,10) Some writers argued that these estimates of p and r were too small, because the Colts' quarterback has a long history of last-second heroics. Still assuming that q = 0.6, for what values of p and r is punting the better choice, and for what values is going for the first down the better choice? (Hint: Find an equation involving p and r that makes the two probabilities equal, then convert this into inequalities to find values for p and q that make each choice better.)
  With q = 0.6, the two winning probabilities are equal if 1 - p = 0.6 + 0.4(1-r) = 0.6 + 0.4 - 0.4r, or if p = 0.4r. If p is less than 0.4r, then punting is the better choice, and if p is greater than 0.4r, going for the first down is the better choice. Note that if Peyton Manning is so much better than an ordinary quarterback that p increases from 0.3 to 0.4, going for the first down is correct even if the Colts were certain to score from Patriots territory (if r = 1).

Last modified 21 November 2009