CMPSCI 240: Reasoning About Uncertainty

Final Exam

David Mix Barrington

18 December 2009

Directions:

Answer the problems on the exam pages.
There are seven problems for 125 total points. Actual scale is A = 105, C = 70.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first five questions are true/false, with five points for the correct boolean answer and up to five for a correct justification.
When the answer to a question is a number, you may give your answer in the form of an expression using arithmetic operations, powers, falling powers, or the factorial function. Probabilities may be given as either fractions or decimals.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 45 points
  Q7: 30 points
  Total: 125 points

Question 1 (10): True or false with justification: At Zane's Noodle Bowl, a customer may choose a wide variety of possible soups. They may choose egg or rice noodles, one of five types of protein (beef, chicken, fish balls, tofu, or none) and any subset of the thirteen kinds of vegetables. Given these options, there are over a million possible ways to order soup.
Question 2 (10): True or false with justification: Suppose that the word "money" occurs in 6% of my spam emails and in 2% of my non-spam emails. Suppose also that I am using a Naive Bayes Classifier spam filter, with word occurrences as features. Then, other things being equal, my filter will multiply its estimated odds of spamness by three for emails containing the word "money", and slightly reduce those odds for emails not containing the word "money".
Question 3 (10): True or false with justification: Consider any two-player simultaneous-move zero-sum game where Players A and B each have a choice of k options and there is a k × k matrix giving the payoff for A in each of the k² possible situations. Suppose that A is going to play a mixed strategy, where the probability that he will take each option is known to B. Then B has a pure strategy that will do at least as well for her as any mixed strategy.
Questions 4 and 5 involve a situation where we choose two three-letter words (over the 26-letter alphabet {A, B,..., Z}, not necessarily English words) at random. Each word is equally likely to be any of the possible three-letter strings, and the choice of the two words is independent.
Question 4 (10): True or false with justification: Let E be the event that at least one letter occurs in the same position of the two words -- that is, that the two words have the same first letter, have the same second letter, or have the same third letter (or that more than one of these things happen). Then Pr(E) ≥ 3/26.
Question 5 (10): True or false with justification: Let F be the event that some letter occurs in both words, regardless of position. Then Pr(F) ≥ 1 - (23/26)³.
Question 6 (45): Professor Kyle is conducting an experiment on animal behavior. She observes a cat for n successive five-minute periods, and characterizes the cat's behavior in each period as "Active", "Quiet", or "Sleeping". For each time period t in the set {1, 2,..., n}, the behavior b(t) is thus either A, Q, or S.
She wants to know whether this sequence of behaviors can be well modeled by a Markov chain. Examining her data, she finds that when b(t) = A, b(t+1) = A 20% of the time and b(t+1) = Q 80% of the time. When b(t) = Q, b(t+1) = A 20% of the time, b(t+1) = Q 20% of the time, and b(t+1) = S 60% of the time. Finally, when b(t) = S, she finds that b(t+1) = Q 20% of the time and b(t+1) = S 80% of the time.
- (a,5) Draw a diagram and write a transition matrix for a Markov chain that has three states and the given transition probabilities. (For your matrix, order the rows A, Q, S.)
- (b,10) Determine the steady-state probability of this Markov chain.
- Looking more closely at her data, Professor Kyle discovers that for the 100 values of t where b(t) = A, b(t+2) = A only four times.
- (c,5) Determine the probabilities of each of the three states of the Markov chain at time t+2, given that the state at time t is A.
- (d,10) Based on your answer to part (c) and the Normal Approximation to the Binomial, determine how unusua it would be to have b(t+2) only four times in 100 situations with b(t) = A. Should she reject the hypothesis that the cat is behaving according to this Markov chain, using a 95% confidence level?
- (e,5) How might the Markov Hypothesis be failing in this situation? Suggest a way in which she might refine her model to be more accurate.
- (f,5) Suppose now that the cat's behavior, as characterized by these three states, is characterized by a Markov Decision Process, where the possible actions are to give the cat some catnip (C) or not (N). What observations would she need to make to completely describe the MDP?
- (g,5) Once she has all the numbers necessary to describe the MDP from part (f), how could she determine a policy that would maximize the percentage of the time that the cat is active, on average in the steady state? (You don't need to do the arithmetic because I'm not giving you the numbers. But describe a correct algorithm that she could use -- don't worry if it's not the most efficient one.)
Question 7 (30): In this problem we will construct a variable-length code to transmit messages where each bit is independently and randomly generated, with Pr(1) = 0.2 and Pr(0) = 0.8.
- (a,10) List the eight possible blocks of three bits that might be generated by this source, and the probability that each block is generated.
- (b,10) Using Huffman's Algorithm (constructing a Huffman Tree), design a variable-length binary code that will use the smallest possible expected number of bits to transmit each three-bit block.
- (c,5) Determine the expected number of bits to transmit a three-bit string using your code.
- (d,5) The correct answer to part (c) is strictly less than three. Thus a message of 3n bits from this source is transmitted, on average, using fewer than 3n bits. How is this possible, given that we need to be able to transmit 2³ⁿ different messages?

Last modified 3 January 2010