# Solutions to Practice Exam for Final Exam

### Directions:

• Answer the problems on the exam pages.
• There are 9 problems for 150 total points. Probable scale is A=140, C=80.
• If you need extra space use the back of a page.
• No books, notes, calculators, or collaboration.
• The first six questions are true/false, with five points for the correct boolean answer and up to five for a correct justification.
• Parts of Questions 7-9 have numerical answers -- remember that logarithms are base 2.

Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 10 points
Q6: 10 points
Q7: 40 points
Q8: 20 points
Q9: 30 points
Total: 150 points

• Question 1 (10): True or false with justification: Let X be a random source that produces a digit from the set {0,1,2,3,4,5,6,7, 8,9}, where each digit has probability 1/10. Let p be the probability that four digits, taken independently from X, are all different. Then p is greater than 1/2.

TRUE. The probability is the number of no-repeat sequences divided by the total number of sequences, which is 104 divided by 104, or 10*9*8*7/10000 = 0.504, greater than 1/2.

• Question 2 (10): True or false with justification: It is possible to design a Turing machine that inputs a string w over the alphabet {a,b,...,z} and finds a variable-length binary code that minimizes the length of the encoding of w.

TRUE. We know that the Huffman algorithm, run on the input distribution given by the frequency of letters in the input string, will produce a tree that has the best possible expected output length for that distribution. But the expected output length on this distribution is exactly the output length of that tree on this input. So the Huffman tree is the one we want, and the algorithm (since it is well-specified and deterministic) can in principle be implemented on a Turing machine.

• Question 3 (10): True or false with justification: Let X be a discrete random source and Y the output of a memoryless channel when X is the input to it, where the values of X and Y are both always integers. Suppose that X and Y always satisfy the rule X + Y = 6. Then the equivocation of Y with respect to X is 0.

TRUE. The equivocation H(Y|X) is the entropy of Y when X is known. But if we know X, Y is necessarily 6 - X and has the constant distribution and an entropy of zero.

• Question 4 (10): True or false with justification: Let Q be the language over the alphabet {a,b,c} consisting of all strings where the number of a's equals the number of c's. Then Q is a regular language.

FALSE. If i and j are two different natural numbers, the strings u = ai and v = aj are Q-distinguishable, because if we let w = ci, then uw is in Q but vw is not. Thus there are infinitely many Myhill-Nerode classes for Q, and we can conclude that Q has no DFA and thus (by Kleene) no regular expression.

• Question 5 (10): True or false with justification: The language Q of Question 4 is Turing decidable.

TRUE. We could build a Turing machine to count the a's, count the c's, compare the numbers, print Y if they are equal, and print N otherwise. If we actually had to build the Turing machine, it would be easier to have it repeatedly scan the input string, looking for the first a and the first c and changing both to b's if it finds them. It would print Y if the string ever became all b's, and print N if it ever finds an a but no c, or a c but no a, on a single scan.

• Question 6 (10): True or false with justification: Let R be the set {M: M is the description of a Turing machine and L(M) is a Turing recognizable language}. Then R itself is Turing recognizable but is not Turing decidable.

FALSE. If M is any Turing machine at all, the language L(M) = {w: M halts on input w} is a Turing recognizable language. Thus R is just the set of strings that are valid Turing machine descriptions. When we specify a system for encoding Turing machines as strings, we do so in a way that there is an algorithm (and thus a Turing machine) to decide whether a string is a valid description.

• Question 7 (40): Let N be a λ-NFA with state set {1,2,3}, start state 1, only final state 3, and four transitions: (1,λ,2), (2,a,2), (2,b,2), and (2,b,3).
• (a,10) Using our given construction, create an ordinary NFA N' with the same state set and the same language as N.

We get the same state set, the three former letter-moves, and three new letter-moves: (1,a,2), (1,b,2), and (1,b,3). (These result because each of the three old letter-moves start at 2 and thus may start at 1 or 2 in the new NFA.) The final state set does not change because there is no λ-path in N from the start state to a final state.

• (b,10) Using the subset construction, find a DFA D with the same language as N'.

The start state is {1} which is non-final. On a, {1} goes to {2}, which is non-final. On b, {1} goes to {2,3} which is final. On a, {2} goes to itself. On b, {2} goes to {2,3}. On a, {2,3} goes to {2}. On b, {2,3} goes to itself. We have a completed DFA with three states.

• (c,10) Using the state minimization construction, find the minimal DFA D' for D.

We start with classes N and F -- since F has only one state we are done with it. N has two states {1} and {2}. Each goes to N on a and to F on b. So this two-class partition is the final partition and we have a minimal DFA with two states N and F, start state N, only final state F, and transition function δ(N,a) = N, δ(N,b) = F, δ(F,a) = N, and δ(F,b) = F.

• (d,10) Using the construction from lecture, find a regular expresssion for the language of D'.

We add a start state I and a final state Z to D', make state F non-final, and add λ-moves from I to N and from F to Z. We first eliminate F, which has one edge into it and two out of it. The two new edges are (N, bb*a, N) (which merges with the existing N-loop to make (N, a + bb*/sup>a, N)) and (N, bb*, Z). Finally we eliminate N to get an r.e.-NFA with one transition, labeled "(λ)(a + bb*a)*bb*". The expression (a + bb*a)* is equivalent to Σ*a, the langauge of all strings ending in a, along with λ. Thus this entire language is all strings that end in an a or are λ, followed by one or more b's. This is just the set of all strings ending in b, or Σ*b -- this is also easy to see from the DFA D'.

• Question 8 (20): Let f(n) be the probability that a uniformly-chosen string of length n from the alphabet {a,b,c} is in the language Q from Question 4. Compute f(0), f(1), f(2), f(3), and f(4). What is the limit, as n goes to infinity, of f(n)?

• f(0) is the number of strings of length 0 in Q (1) divided by the total number of strings of length 0 (also 1), or 1.
• f(1) is the number of Q-strings of length 1 (1, just "b") divided by the total number of strings of length b (3), or 1/3.
• f(2) is 3 (for "ac", "bb", and "ca") divided by 32 = 9, or 1/3 again.
• f(3) is 7 (for "bbb" plus the 3! permutations of "abc") divided by 33, or 7/27.
• f(4) is the number of length-4 Q-strings over 34 = 81. These Q-strings separate into 1 with four b's ("bbbb"), 12 with two b's (permutations of "abbc", there are four places to put the a and then three to put the b), and 6 with no b's (permutations of "aacc" -- there are (4 choose 2) = 6 locations for the two a's), for a total of 19. Thus f(4) = 19/81.
• The limit as n goes to infinity of f(n) is 0, because the probability is eventually smaller than any positive number ε. To see this, define a random variable X equal to the number of a's minus the number of c's. For each letter, X is +1, 0, or -1, each with probability 1/3, so the variance of X for a one-letter string is E(X2) - E(X)2 = 2/3 - 0 = 2/3. If X has n letters, it is the sum of n independent random variables with variance 2/3 each, so its variance is 2n/3 and the standard deviation is (2n/3)1/2 = Θ(n1/2). As n increases, by the Central Limit Theorem X approaches a normal random variable with mean 0 and standard deviation Θ(n1/2). The chance of such a variable having value between 1 and -1 is Θ(n-1/2), which is o(1), eventually smaller than any positive ε.

• Question 9 (30): Suppose we take n successive bits from a source Z that is not memoryless. The first bit b1 is equally likely to be 0 or 1, but each succeeding bit bi+1 is equal to bi with probability 3/4 and different from it with probability 1/4, the events "bi+1 is different from bi" for different i being independent.
• (a,10) What is the entropy of each bit bi if all previous bits are known? (The answer may not be the same for each i.) What is the joint entropy of the first n bits from this source, for arbitrary n?

The first bit is equally likely to be 0 or 1, so its entropy is 1. The later bits have 1/4 probability of one value and 3/4 of the other, so their entropy is (1/4) (-log 1/4) + (3/4)(-log 3/4) = (1/4)(2) + (3/4)(2 - log 3) = 2 - 3(log 3)/4 = (about) 2 - (3/4)(1.6) = 0.8. The joint entropy of the first n bits is the sum of the entropies of the independent variables, which is 1 + (0.8)(n-1) or 0.8n + 0.2.

• (b,10) Suppose we view 2n bits from Z as n letters from the alphabet {00,01,10,11}. What is the probability of the i'th of these letters taking on each of these four values, without any assumption about the previous letters? (In this case the answer does not depend on i.) What is the entropy of this distribution, assuming that log 3 = 1.6?

The first of the two bits is equally likely to be 0 or 1, and the second is equal to the first with 3/4 probability. So 00 and 11 have probability 3/8 each, and 01 and 10 have probability 1/8. The entropy is (3/8)(-log 3/8) + (3/8)(-log 3/8) + (1/8)(-log 1/8) + (1/8)(-log 1/8) or (3/4)(3 - log 3) + (1/4)(3) = 3 - 3(log 3)/4 or about 1.8.

• (c,10) If we send the two-bit letters using a variable-length binary code optimized for the distribution computed in (b), what is the expected number of bits we will need to send n letters?

The n two-bit letters will require an expected 1.8 bits each, for 1.8n total bits. This contrasts with the 2n bits needed to send them literally. The joint entropy of the 2n bits is only about 1.6n by (a), so there ought to be a code to send them more efficiently.