CMPSCI 251: Mathematics of Computation
Third Midterm Exam
David Mix Barrington
20 April 2007
Directions:
- Answer the problems on the exam pages.
- There are six problems on pages 2-7,
for 100 total points.
Actual scale was A=92, C=57.
- If you need extra space use the back of a page.
- No books, notes, calculators, or collaboration.
- The first four questions are true/false, with five points for the correct
boolean answer and up to five for a correct justification.
- Questions 5 and 6 have numerical answers -- remember that logarithms are
base 2.
Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 30 points
Q6: 30 points
Total: 100 points
- Question 1 (10):
True or false with justification:
Let X be the input alphabet for a memoryless channel and let Y be the output
alphabet. Then the mutual information I(X,Y) is the same for any discrete
random source with alphabet X.
- Question 2 (10):
True or false with justification:
If a linear code is able to correct up to t bit errors per block, then it
must be able to detect up to 2t errors per block.
- Question 3 (10):
True or false with justification:
If X and Y are two discrete random variables, the mutual information I(X,Y)
can never be greater than the joint entropy H(X,Y).
- Question 4 (10):
True or false with justification:
Because at least four bits are required in the worst case to send one digit
from {0,1,...,9}, at least 4n bits are required in the worst case to send n
digits from that alphabet.
- Question 5 (30):
Suppose we have a set of DNA sequences, strings over the alphabet
{A,C,G,T}, and that these come from a distribution where each letter is chosen
independently and for each letter, the probability of A is 3/8, the probability
fo C is 1/8, the probability of G is 1/8, and the probability of T is 3/8.
- (a,5) If I have a sequence of n letters, how many bits do I need to
specify it using a fixed-length binary code?
- (b,10) Using Huffman's algorithm, design a variable-length binary code
that has the minimum possible average length for this distribution of letters.
What is the expected number of bits you need to send an n-letter sequence using
this code?
- (c,10) Compute the entropy of this distribution. Get a numerical
answer accurate to within 0.2 at worst. You may estimate log 3 as 1.6, which
allows you to compute other base-two logs. (For example, log 12 = log 4 +
log 3 = 3.6, and log 4/3 = log 4 - log 3 = 0.4.)
- (d,5) Suppose we sent an n-letter sequence by grouping the letters into
k-letter blocks, designing a Huffman variable-length code for k-letter blocks,
and using that code. As n and k increase, how many bits do we need to send an
n-letter sequence?
- Question 6 (30):
This problem concerns several languages (sets of strings) over the alphabet
{0,1}, given by regular expressions. Let R be the regular expression
0* + 1*, S be the regular expression
0*1*, and T be the regular expression (00+11)*.
- (a,10) How many binary strings have length 4? How many of these are
in the languages of each of the three regular expressions? List the strings
(of length 4) in each of these languages.
- (b,10) How many strings of length n are in each of these languages,
as a function of the positive integer n? (In one case there are separate
answers for odd n and for even n.) In each of the three cases, suppose you had
a string of length n in the language and you needed to tell someone which
string it was. How many bits would you need, assuming that the recipient knows
the language, knows n, and has agreed on a coding method with you?
- (c,10) What does it mean for a set of strings (all of the same length)
to be a linear code? (This is also called being a subspace
in the text.) For a fixed positive n, consider the sets of length-n strings
in each of the three languages. Which of these sets of strings are linear
codes, if any?
Last modified 23 April 2007