Question text is in black, solutions are in blue
Solution to 5(e) corrected 19 April 2007.
Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 30 points Q6: 30 points Total: 100 points
FALSE. The average length of a code word could be greater, and in fact it will always be greater when any probability in X is not a power of 1/2. For example, if X has three letters of equal probability, the average length of a code word is 5/3 (length 2 with 2/3 chance, length 1 with 1/3 chance) while the entropy of the source is log(3) or about 1.59, smaller than 5/3.
TRUE. The first language is strings consisting of zero or more ab's -- these strings begin with a except for the empty string. The second is strings of zero or more ba's, and these begin with b except for the empty string. Thus the empty string is the only candidate to be in both, and it is in both because it is the concatenation of zero ab's or of zero ba's.
FALSE. There are (85)n such sequences of n Cherokee syllables, but only 26n = (64)n sequences of 6n bits. So some pair of distinct syllable sequences would have to map to the same bit string.
FALSE. The number of such strings is 6!/(2!2!2!) = 720/8 = 90, and the total number of six-letter strings over this alphabet is 36 = 729. So the probability is 90/729, more than 10%.
There are eight, one for each combination of the three message bits. They are 0000, 0011, 0101, 0110, 1001, 1010, 1100, and 1111.
There are nonzero code words of weight 2, but not of weight 1, so 2 is the minimum weight. This means that this code can detect up to, but not including, 2 errors, so its maximum error detection capacity is 1.
The minimum number is zero because this code cannot correct single errors. A message 0001, for example, might come from the code words 1001, 0101, 0011, or 0000 by a single error -- we have no way to tell which.
We detect an error if either one or three bits are sent incorrectly, which happens with probability 32/81 + 8/81 = 40/81, so Pr(A) = 41/80. Pr(B) is the chance of no bits being sent incorrectly, which is 16/81. This is also Pr(A∩B). Pr(B|A) is thus 16/41 = 0.3902. This coding scheme is not very effective because 61% of the time we accept a packet as not having an error, it is not the packet that was sent.
H(X) = 1 because for any bit b, Pr(X=b) = 1/2 and -log(1/2) = 1. Y is equal to 0 3/4 of the time and equal to 1 1/4 of the time, so H(Y) = (3/4)log(4/3) + (1/4)log(4) = (3/4)(2 - log(3)) + 1/2 = 2 - (3/4)log(3) or about 2 - (3/4)(1.6) = 0.8.
There are three possible values for (X,Y): (0,0) with probability 1/2, (1,0) with probability 1/4, and (1,1) with probability 1/4. H(X,Y) is the expected value of log(1/Pr(x,y)), which is 1 half the time and 2 the other half, so H(X,Y) = 3/2. This makes I(X,Y) = H(X) + H(Y) - H(X,Y) about 1.0 + 0.8 - 1.5 = 0.3, and H(X|Y) = H(X) - I(X,Y) = 1.0 - 0.3 = 0.7.
Since 0's are reliable, if we receive anything other than 000 we know that the message sent was 111. So we correct anything other than 000 to 111. If 000 is sent, we always receive it correctly. If 111 is sent, we interpret it correctly unless all three bits are sent incorrectly, which happens with probability 1/8. So our chance of getting the correct bit that was sent is (1/2)(1) + (1/2)(7/8) = 15/16.
Last modified 19 April 2007