Third Midterm Exam

Directions:

• Answer the problems on the exam pages.
• There are six problems on pages 2-7, for 100 total points. Actual scale was A=92, C=57.
• If you need extra space use the back of a page.
• No books, notes, calculators, or collaboration.
• The first four questions are true/false, with five points for the correct boolean answer and up to five for a correct justification.
• Questions 5 and 6 have numerical answers -- remember that logarithms are base 2.

```  Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 30 points
Q6: 30 points
Total: 100 points
```

• Question 1 (10): True or false with justification: Let X be the input alphabet for a memoryless channel and let Y be the output alphabet. Then the mutual information I(X,Y) is the same for any discrete random source with alphabet X.

• Question 2 (10): True or false with justification: If a linear code is able to correct up to t bit errors per block, then it must be able to detect up to 2t errors per block.

• Question 3 (10): True or false with justification: If X and Y are two discrete random variables, the mutual information I(X,Y) can never be greater than the joint entropy H(X,Y).

• Question 4 (10): True or false with justification: Because at least four bits are required in the worst case to send one digit from {0,1,...,9}, at least 4n bits are required in the worst case to send n digits from that alphabet.

• Question 5 (30): Suppose we have a set of DNA sequences, strings over the alphabet {A,C,G,T}, and that these come from a distribution where each letter is chosen independently and for each letter, the probability of A is 3/8, the probability fo C is 1/8, the probability of G is 1/8, and the probability of T is 3/8.

• (a,5) If I have a sequence of n letters, how many bits do I need to specify it using a fixed-length binary code?

• (b,10) Using Huffman's algorithm, design a variable-length binary code that has the minimum possible average length for this distribution of letters. What is the expected number of bits you need to send an n-letter sequence using this code?

• (c,10) Compute the entropy of this distribution. Get a numerical answer accurate to within 0.2 at worst. You may estimate log 3 as 1.6, which allows you to compute other base-two logs. (For example, log 12 = log 4 + log 3 = 3.6, and log 4/3 = log 4 - log 3 = 0.4.)

• (d,5) Suppose we sent an n-letter sequence by grouping the letters into k-letter blocks, designing a Huffman variable-length code for k-letter blocks, and using that code. As n and k increase, how many bits do we need to send an n-letter sequence?

• Question 6 (30): This problem concerns several languages (sets of strings) over the alphabet {0,1}, given by regular expressions. Let R be the regular expression 0* + 1*, S be the regular expression 0*1*, and T be the regular expression (00+11)*.

• (a,10) How many binary strings have length 4? How many of these are in the languages of each of the three regular expressions? List the strings (of length 4) in each of these languages.

• (b,10) How many strings of length n are in each of these languages, as a function of the positive integer n? (In one case there are separate answers for odd n and for even n.) In each of the three cases, suppose you had a string of length n in the language and you needed to tell someone which string it was. How many bits would you need, assuming that the recipient knows the language, knows n, and has agreed on a coding method with you?

• (c,10) What does it mean for a set of strings (all of the same length) to be a linear code? (This is also called being a subspace in the text.) For a fixed positive n, consider the sets of length-n strings in each of the three languages. Which of these sets of strings are linear codes, if any?