CMPSCI 240: Reasoning About Uncertainty

Solutions to Second Midterm Exam

David Mix Barrington

21 October 2009

Directions:

There are six problems for 100 total points. Actual scale was A=85, C=55.
Question text is in black, answers are in blue.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 30 points
  Q6: 30 points
  Total: 100 points

The first two true-false questions involve a random variable X that is equal to 0 with probability 1/2, equal to 1 with probability 1/4, and equal to 2 with probability 1/4. We could produce a value from X by flipping a fair coin, flipping it a second time if and only if the first flip is heads, and returning the total number of times we flip heads.
Question 1 (10): True or false with justification: The variance of X is exactly three times the expected value of X.
FALSE. E(X) = 0(1/2) + 1(1/4) + 2(1/4) = 3/4. E(X²) = 0(1/2) + 1(1/4) + 4(1/4) = 5/4. Var(X) = E(X²) - E(X)² = (5/4) - (3/4)² = 11/16. It is not true that 11/16 is exactly three times 3/4.
Question 2 (10): True or false with justification: If we take n values from X independently and add them together, the resulting random variable has a standard deviation of 3n/2.
FALSE. The variance of the sum of n independent random variables is the sum of the n individual variances, which in this case is 11n/16. Thus the standard deviation of the sum is the square root of 11n/16. Even if we didn't know the exact value of the variance of X, we could say that the given statement is false because the standard deviation must be of the form c*sqrt(n) for some number c, and this can't possibly equal 3n/2 for all values of n.
Question 3 (10): True or false with justification: If I deal three cards from a standard 52-card deck, with every set of three cards being equally likely, then the probability that the three cards have three different suits is less than or equal to 3/8.
FALSE. The most straightforward way to compute this probability is to count the sets of three cards of three different suits and divide by (52 choose 3). To count the former sets we pick three suits in (4 choose 3) = 4 ways and then pick one card of each suit in 13³ = 2197 ways, for 8788 total. Since (52 choose 3) = 52*51*50/1*2*3 = 26*17*50 = 1300*17 = 22100, the probability is 6591/22100 and without a calculator we can see that this is very close to 0.4 and thus greater than 3/8. (The fraction reduces to 169/425, and 3/8 of 425 would be 3*53.125 = 159.375, less than 169.)
It's possible to show that this probability is greater than 3/8 with much less calculation. Consider the cards as a sequence (a perm) rather than a set. The first card is guaranteed to be of a new suit. The second card has a probability of 39/51 of being of a new suit, slightly more than 39/52 = 3/4. The third card, if the second was of a new suit, has a probability of 26/50 of being of a third suit, slightly more than 1/2. So the probability of all three events happening in sequence is 1 * (more than 3/4) * (more than 1/2) = more than 3/8.
Question 4 (10): True or false with justification: If I deal three cards from a standard 52-card deck, with every set of three cards being equally likely, then the probability that the three cards contain at least one spade is greater than or equal to 3/4.
FALSE. First the straightforward way to calculate the probability: the number of sets of cards with no spades is (39 choose 3), so the probability we want is 1 - (39 choose 3)/(52 choose 3) = 1 - 39*38*37/52*51*50 because the 1*2*3 denominators of the choose functions cancel. We can reduce this last fraction a bit by cancellation to 3*38*37/4*51*50 = 38*111/51*200 = 4218/10200. Since this is clearly greater than 1/4, the probability we want is less than 3/4.
Here's another method that illustrates the use of expected value in solving probability problems. One reason you might think that 3/4 is the right answer is each of the three cards has a 1/4 chance to be a spade. The correct conclusion to draw from this is that the expected number of spades in the hand, which we know is also equal to 0*Pr(S=0) + 1*Pr(S=1) + 2*Pr(S=2) + 3*Pr(S=3). This is bigger than Pr(S=1) + Pr(S=2) + Pr(S=3), which the probability we want, because Pr(S=2) and Pr(S=3) are both positive. So since our probability is less than something that is equal to 3/4, the statement is false.
By the way, it's my usual practice to select booleans for my true/false questions randomly and independently, with "true" and "false" being equally likely. So it's a coincidence that all four answers were "false".
Question 5 (30): Hans is driving several passengers from Dagstuhl to the Frankfurt airport in his van (a Volkswagen, naturlich). He is picking them up at 6:00 a.m., and it is very important that they reach the airport by 8:30. He could take the autobahn, which has an average travel time of 110 minutes and a variance of 400 min², or he could take back roads, which has an average travel time of 130 minutes and a variance of only 25 min².
- (a,10) Suppose first that each travel time comes from a normal distribution with the given mean and variance. What is the approximate probability that Hans will fail to meet the 150-deadline if he takes the autobahn? Is he more or less likely to meet the deadline by taking back roads? Justify your answer.
  The autobahn travel time T_a, now assumed to be a normal random variable, has mean 110 and standard variation sqrt(400) = 20. For Hans to miss the deadline, this variable must be more than two standard deviations above its mean, which happens with probability about 0.025. (We know that there is about a 95% probability that it is within 2σ of the mean, and for a normal variable the "left tail" and "right tail" have the same size. It's the right tail that has him missing the deadline.
  The back-road travel time T_b is also a normal variable, with mean 130 and standard deviation sqrt(25) = 5 minutes. For Hans to miss the deadline on the back roads, this variable must be more than four standard deviations above its mean, which is much less likely than for T_a to be two standard deviations above its own mean.
- (b,10) Now we no longer assume that the travel times are normally distributed. Use the Markov Inequality to compute a bound on the probability that Hans fails to meet his deadline in each case, using the given average times and the assumption that the travel time cannot be negative. That is, compute probabilities p_a and p_b such that the Markov Inequality tells you that the chance of being late on the autobahn is at most p_a and that the chance of being late on the back roads is at most p_b.
  The Markov Inequality says that if X is never negative and has mean μ, the probability that |X - μ| is greater than or equal to cμ is at most 1/c. This is true for any constant c, so we just have to find the relevant c in each case.
  To miss the deadline on the autobahn, T_a must be greater than 150 = μ_a*(150/110), so we can take c to be 15/11 and apply the Markov Inequality to say that the probability of missing the dealine is at most p_a = 11/15. If we imagine a T_a that is slightly greater than 150 with probability slightly less than 11/15 and 0 otherwise, and thus has mean 110, we see that this bound could be close to the actual case.
  To miss the deadline on the back roads, T_b must be greater than 150 = &mu_b*(150/130), so in this case c is 15/13 and the bound on the probability is p_b = 13/15. We don't get as good a bound in this case (and neither bound gives Hans much comfort) because we are not using the information about the variance. It is that information that tells us that the back-road route is more reliable.
- (c,10) Again, we no longer assume that the travel times are normally distributed. Now use the Chebyshev Inequality to get bounds q_a and q_b on the probability of missing the deadline on the autobahn and on the back roads respectively. Remember that this result uses the mean and variance of the given distribution, and no other assumptions.
  The Chebyshev Inequality says that for any random variable X with mean μ and standard deviation σ, the probability that |X - μ| is greater than or equal to cσ is at most 1/c², where c is any positive number. (I really think the best way to memorize this result is to remember how it is proved using the Markov Inequality on the random variable (X = μ)².)
  We already found the correct c's in part (a). For T_a, Chebyshev tells us that Pr(|T_a - 110| ≥ 40) ≤ q_a = 1/4. For T_b, Chebyshev tells us that Pr(|T_b - 130| ≥ 20) ≤ q_b = 1/16. Unlike part (a), this case does not let us conclude that the two "tails" are of equal size, so these are the best estimates we have for Pr(T_a ≥ 150) and Pr(T_b ≥ 150).
Question 6 (30): This multipart question deals with the game of poker dice. A player throws five fair, independent six-sided dice. After the first throw of all five dice, she may pick some or all of the dice to roll a second time, in the hope of making a better combination.
- (a,5) What is the probability that the player rolls five sixes (the best possible hand) on her first roll?
  We can either use the Product Rule for probabilities to get (1/6)⁵ or use the Probability Rule to get 1 (the number of ways to get five sixes) over 6⁵ (the number of total ways to throw five six-sided dice). Multiplying out either expression gives 1/7776.
- (b,5) What is the probability that she gets three of a kind on her first roll? This means that exactly three dice show one number, and that the other two are different from each other and from the three. (For example, 4-3-4-4-6 is a "three of a kind" combination but 4-4-5-4-4 and 4-2-4-4-2 are not.)
  Once again the denominator is 6⁵ = 7776. For the numerator, we must count sequences with three of one number, one of a second number, and one of a third. There are six choices for the first number, five for the second, and four for the third, for 6*5*4 = 120 choices of numbers. Given the three numbers, there are (5 choose 3) = 10 ways to pick three of the five positions for the number that occurs three times. Then we can put the second number left of the third number -- if we use both orders we count every sequence twice instead of once. So we have 1200/7776 or about 15%.
- (c,5) Now assume that the player has rolled three of a kind, specifically 4-3-4-4-6, and that she has chosen to reroll the two dice that are not 4's. What is the chance that on her second roll she improves her hand to a full house? (A full house is three dice with one number and two dice with a different number, such as 4-2-4-4-2.)
  The two dice we throw can come out as 1-1, 2-2, 3-3, 5-5, or 6-6. Of course if they came out 4-4 we would have five of a kind, not a full house. The probability is 5/6² = 5/36.
- (d,5) In the situation of part (c) what is the probability that the player improves her hand on the second roll to four of a kind? What is the probability that she improves it to five of a kind?
  To get four of a kind, she needs one four and one non-four, and they can be in either order: 4-1, 4-2, 4-3, 4-5, 4-6, 1-4, 2-4, 3-4, 5-4, or 6-4. Ten four-of-a-kinds out of 6² total throws gives a probability of 10/36 = 5/18.
  To get five of a kind, she needs two fours, which is exactly one of 6² possible throws, a probability of 1/36.
- (e,10) Now assume that the player has three total rolls (as in the commercial game Yahtzee^TM). That is, she first throws five dice, then rerolls some of those dice, then (if she chooses) re-rerolls some of the dice that she rerolled. Again assume that her first roll is 4-3-4-4-6. She will use her second roll, and if necessary her third roll, to maximize her chance of getting five 4's at the end. What is her total probability of succeeding in doing this? (Again, starting with the situation after her first roll.)
  There are three ways she could finish with five fours:
  These three events are pairwise disjoint so we add the probabilities: (36/36*36) + (10*6/36*36) + (25/36*36) = (36+60+25)/36*36 = 121/1296 or a bit less than 10%.
  As we discussed in lecture on 23 October, this is an example of the Law of Total Probability. The most common error was to add in (1/36) instead of (25/36)*(1/36) for the third case, forgetting that the third case only happens if the second throw gets no fours.

Last modified 23 October 2009