CMPSCI 251: Mathematics of Computation

Practice Exam Solutions for Third Midterm

David Mix Barrington

18 April 2007

Directions:

Answer the problems on the exam pages.
There are six problems on pages 2-7, for 100 total points. Probable scale is A=93, C=63.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first four questions are true/false, with five points for the correct boolean answer and up to five for a correct justification.
Questions 5 and 6 have numerical answers -- remember that logarithms are base 2.

Question text is in black, solutions are in blue

Solution to 5(e) corrected 19 April 2007.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 30 points
  Q6: 30 points
 Total: 100 points

Question 1 (10): True or false with justification: Let X be a discrete random source. If we construct a variable-length binary code for X using Huffman's algorithm, the average number of bits we need to transmit a letter of X will always be equal to the entropy of X.
FALSE. The average length of a code word could be greater, and in fact it will always be greater when any probability in X is not a power of 1/2. For example, if X has three letters of equal probability, the average length of a code word is 5/3 (length 2 with 2/3 chance, length 1 with 1/3 chance) while the entropy of the source is log(3) or about 1.59, smaller than 5/3.
Question 2 (10): True or false with justification: A string is in the intersection of the regular languages (ab)^* and (ba)^* if and only if it is the empty string.
TRUE. The first language is strings consisting of zero or more ab's -- these strings begin with a except for the empty string. The second is strings of zero or more ba's, and these begin with b except for the empty string. Thus the empty string is the only candidate to be in both, and it is in both because it is the concatenation of zero ab's or of zero ba's.
Question 3 (10): True or false with justification: The Cherokee writing system contains 85 symbols, one for each syllable that occurs in the language. With an approriate coding system, we could transmit any sequence of n syllables (whether it made sense in Cherokee or not) using at most 6n bits.
FALSE. There are (85)ⁿ such sequences of n Cherokee syllables, but only 2⁶ⁿ = (64)ⁿ sequences of 6n bits. So some pair of distinct syllable sequences would have to map to the same bit string.
Question 4 (10): True or false with justification: Let w be a string of six letters from {a,b,c} where each letter is chosen independently with probability 1/3 of each letter. Then the probability that w has exactly two occurrences of each letter is less than 5%.
FALSE. The number of such strings is 6!/(2!2!2!) = 720/8 = 90, and the total number of six-letter strings over this alphabet is 3⁶ = 729. So the probability is 90/729, more than 10%.
Question 5 (30): Suppose that we send bits over a symmetric binary channel using a (4,3) parity-check code. This means that after every three message bits b₁, b₂, and b₃, we send a parity bit $p$ that is equal to b₁+b₂+ b₃, using addition modulo 2.
- (a,5) Recall that a code word is a sequence of four bits that could be a possible valid message. How many code words are there? List them.
  There are eight, one for each combination of the three message bits. They are 0000, 0011, 0101, 0110, 1001, 1010, 1100, and 1111.
- (b,5) What is the minimum Hamming weight of a nonzero word in this code? What does this imply about the error detection capacity of the code?
  There are nonzero code words of weight 2, but not of weight 1, so 2 is the minimum weight. This means that this code can detect up to, but not including, 2 errors, so its maximum error detection capacity is 1.
- (c,5) What is the error correction capacity of this code? That is, what is the minimum number of errors that could occur and still allow the receiver to determine the most likely message to have been sent?
  The minimum number is zero because this code cannot correct single errors. A message 0001, for example, might come from the code words 1001, 0101, 0011, or 0000 by a single error -- we have no way to tell which.
- (d,10) Let the error probability of the channel be 1/3, so that a bit is transmitted correctly with probability 2/3 and incorrectly with probability 1/3. For each i in the set {0,1,2,3,4}, compute the probability of exactly i errors among the four bits. (You may express your answers as fractions.)
- (e,5) Again assume an error probability of 1/3. Let A be the event that no error is detected (for a single packet of four bits) and B be the event that no error occurred. Compute Pr(B|A) (to within an additive error of +/- 10 percent). Is the coding scheme effective for this channel? Explain your answer.
  We detect an error if either one or three bits are sent incorrectly, which happens with probability 32/81 + 8/81 = 40/81, so Pr(A) = 41/80. Pr(B) is the chance of no bits being sent incorrectly, which is 16/81. This is also Pr(A∩B). Pr(B|A) is thus 16/41 = 0.3902. This coding scheme is not very effective because 61% of the time we accept a packet as not having an error, it is not the packet that was sent.
Question 6 (30): This problem concerns a channel C that has input alphabet X = {0,1} and output alphabet Y = {0,1}. C always transmits 0's correctly, but transmits 1's as 0's half the time and 1's half the time.
- (a,10) Suppose we define a discrete random source X such that 0's and 1's are equally likely, and let the random variable Y be the output from sending bits from X through C. Compute the entropies H(X) and H(Y). (You may estimate the base-two log of 3 as 1.6.)
  H(X) = 1 because for any bit b, Pr(X=b) = 1/2 and -log(1/2) = 1. Y is equal to 0 3/4 of the time and equal to 1 1/4 of the time, so H(Y) = (3/4)log(4/3) + (1/4)log(4) = (3/4)(2 - log(3)) + 1/2 = 2 - (3/4)log(3) or about 2 - (3/4)(1.6) = 0.8.
- (b,10) Compute the joint entropy H(X,Y), the mutual information I(X,Y), and the equivocation H(X|Y) for these two variables. Again assume that 0's and 1's from X are equally likely.
  There are three possible values for (X,Y): (0,0) with probability 1/2, (1,0) with probability 1/4, and (1,1) with probability 1/4. H(X,Y) is the expected value of log(1/Pr(x,y)), which is 1 half the time and 2 the other half, so H(X,Y) = 3/2. This makes I(X,Y) = H(X) + H(Y) - H(X,Y) about 1.0 + 0.8 - 1.5 = 0.3, and H(X|Y) = H(X) - I(X,Y) = 1.0 - 0.3 = 0.7.
- (c,10) Assume now that we code bits using triple repetition and send them over this channel. (That is, we send 0's as 000 and 1's as 111, with 000 and 111 being equally likely.) Describe a sensible error-correction scheme for the receiver to use and find the probability (for a random input) that the three-bit packet received is interpreted correctly.
  Since 0's are reliable, if we receive anything other than 000 we know that the message sent was 111. So we correct anything other than 000 to 111. If 000 is sent, we always receive it correctly. If 111 is sent, we interpret it correctly unless all three bits are sent incorrectly, which happens with probability 1/8. So our chance of getting the correct bit that was sent is (1/2)(1) + (1/2)(7/8) = 15/16.

Last modified 19 April 2007