CMPSCI 401: Theory of Computation

Solutions to First Midterm Exam, Spring 2009

David Mix Barrington

3 March 2009

Directions:

Answer the problems on the exam pages.
There are seven problems for 120 total points. Actual scale was A=96, C=66.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first four questions are true/false, with five points for the correct boolean answer and up to five for a correct justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. -- note that there is no reason not to guess if you don't know.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 20 points
  Q6: 30 points
  Q7: 30 points

  Total: 120 points

Question text is in black, solutions in blue.

Question 1 (10): True or false with justification: If N is any NFA, then there exists an NFA N' with three or fewer states such that L(N) = L(N'). (Recall that NFA's are defined to read at most one letter per transition.)
FALSE. By the Subset Construction, N' would be simulated by a DFA with at most 2³ = 8 states. But some regular languages, such as (a⁹)^*, have no DFA with eight or fewer states and so could not be equal to L(N').
Many people gave examples of languages with four-state NFA's, such as {aaa}, and asserted that they had no three-state NFA's. Most of these arguments were invalid because they relied on distinguishability of strings, which only gives a lower bound on the size of DFA's. Here is a correct version of the argument for the language {aaa}. Let N' be an NFA with L(N') = {aaa}. There must be a path of three a-edges (plus possibly some ε-edges) from the start state to some final state. Call the start state 0, the state following the first a-edge 1, the state following the second a-edge 2, and the final state 3. States 0, 1, and 2 must be non-final because otherwise ε, a, or aa could be accepted and they are not in L(N'). No two states among 0, 1, and 2 could be equal, because otherwise the machine could accept a or aa. So N' has at least one final state and at least three non-final states and thus has at least four states.
Question 2 (10): True or false with justification: The book defines PDA's so they can push at most one, pop at most one, and read at most one character per transition. Suppose that we redefine PDA's to allow them to push or pop at most two characters per transition, and still read at most one. Then if M is any PDA, there exists a PDA M' with at most three states such that L(M) = L(M').
TRUE. Find a Chomsky Normal Form grammar G for L(M), which is possible because L(M) is a context-free language. (Our theorem gets a CFG from ordinary PDA's, but if M is a two-character PDA it can be converted to an ordinary PDA by adding states.) Construct the top-down parser for G, as in the proof in Sipser that every CFG has an equivalent PDA. This PDA has three states, pops or reads at most one character per transition, and pushes at most two characters per transition as the right-hand side of a rule in G has at most two characters.
We have no experience in showing that a PDA requires a certain number of states, so the arguments purporting to do so were largely nonsense.
Question 3 (10): True or false with justification: Define a left-regular grammar to be a context-free grammar with the property that there is at most one non-terminal on the right-hand side of any rule, and that this non-terminal must be the first character of the right-hand side. Then a language has a left-regular grammar if and only if it is regular.
TRUE. In the in-class writing exercise we proved that a language has a right-regular grammar if and only if it is regular. A language has a left-regular grammar if and only if its reversal has a right-regular grammar, because if G is left-regular or right-regular you can get a grammar for L(G)^R (defined to be {w: w^R ∈ L(G)}) by reversing the right-hand side of every rule in G. And a language is regular if and only if its reversal is regular (just reverse the regular expression for L to get a regular expression for L^R, and vice versa).
Many students answering this question didn't seem to realize that left-regular and right-regular weren't the same thing. Others translated a left-regular G into any NFA for L(G)^R, but claimed to have an NFA for L(G).
Question 4 (10): True or false with justification: Let L be the language {aⁱb^jc^k: j = 2i + k}. Then L is not a context-free language.
FALSE. L is generated by the grammar with rules S → XY, X → aXbb, X → ε, Y → bYc, and Y → ε. Equivalently, a PDA for L could push two a's on the stack for each a read, pop an a from the stack for each b read until the stack is empty, push a b on the stack for each remaining b read, pop a b for each c read, and accept if the stack is empty at the end of the input.
Many people tried to use the CFL Pumping Lemma to show that L is not context-free. But there's no reason string v in the CFL PL could not be "a", and string y be "bb", so that pumping would maintain the relationship j = 2k + i.
Question 5 (20): A binary string is defined to be an ASCII string if (1) its length is divisible by 8, and (2) when it is partitioned into substrings of eight bits each, every such substring has an even number of ones.
- (a,5) Prove (by any valid method) that the set of ASCII strings is a regular language.
  Of course building a valid DFA, as in part (b), is enough. I think the easiest proof is to define A to be the set of eight-bit strings with an even number of 1's, and note that the set of ASCII strings is A^* and so is regular because A, a finite set, must be regular.
  It is true that the set of strings whose length is divisible by 8 is regular, and that the set of strings with an even number of 1's is regular, but the set of ASCII strings is not the intersection of these two sets. It is contained in this intersection, but 0⁷110⁷, for example, is in the intersection but is not an ASCII string because it fails condition (2).
- (b,15) Find the number of states in the minimal DFA for the language of ASCII strings. (Equivalently, find the number of equivalence classes for the Myhill-Nerode equivalence relation for this language.) Prove your answer, by giving the minimal DFA and showing it to be minimal, or otherwise.
  The minimal DFA has sixteen states, which we may call 0a, 1a, 2a,..., 7a, 0b, 1b,..., 7b. The start state is 0a, which is also the only final state. The 0-arrow from ia (for any number i) goes to (i+1)a, and the 1-arrow from ia goes to (i+1)b -- in both cases the addition is done modulo 8. The 0-arrow from ib goes to (i+1)b and the 1-arrow from ib goes to (i+1)a, except that both arrows from 0b go to itself.
  A string takes the DFA to ia if and only if its length is congruent to i mod 8, the last i characters have an even number of 1's, and the string obtained by deleting the last i characters is an ASCII string. (Thus it takes the DFA to 0a if and only if it is itself an ASCII string.) A string takes the DFA to ib, for positive i, if and only if its length is congruent to i modulo 8, the last i characters have an odd number of ones, and the string obtained by deleting the last i characters is an ASCII string. A string takes the DFA to 0b if and only if it contains a "bad byte", an eight-bit substring with an odd number of 1's that prevents the first 8k characters of the string, for some k, from being an ASCII string. These conditions are preserved by every new letter read, showing that this 16-state DFA is correct.
  This DFA is minimal because the strings &epsilon, 0, 00,..., 0⁷, 1, 01, 001,..., 0⁷1 are pairwise ASCII-distinguishable. First note that each of these strings takes the DFA to a different state. The string z = 0^8-i takes 0ⁱ to an ASCII string, but none of the other 15 strings. The string 0^6-i1 takes 0ⁱ1 to an ASCII string, but none of the other 15 strings. And no string at all takes 0⁷1 to an ASCII string. So for any two of these 16 strings, we can find a z that distinguishes them.
Question 6 (30): Let P be the PDA with state set {p,q,r,s,f}, start state p. final state set {f}, input alphabet {a,b}, stack alphabet {a,b}, and the following six transitions: (p, ε, ε; q, b), (q, a, ε;, q, a), (q, b, ε; r, b), (r, b, b; s, ε), (s, a, a; s, ε), and (s, ε, b; f, ε). (Recall that a transition (s, c, d; t, z), for example, goes from state s to state t while reading c, popping d, and pushing z.)
- (a,5) Describe the language L(P) in English.
  L(P) is the set of strings with two b's and an even number of a's, where the b's come together and divide the a's into two equal blocks, that is, L(P) = {aⁿbbaⁿ: n ≥ 0}. This is because an accepting run of P must push a b, read and push some number of a's, read and push a b, read and pop a b, read an pop a number of a's equal to the number pushed, and pop a b.
- (b,5) Using any valid method, give a context-free grammar whose language is L(P).
  From the English description, the grammar with one non-terminal S and two rules, S → aSa and S → bb, generates exactly L(P).
- (c,5) Is L(P) a regular language? Prove your answer.
  L(P) is not regular. Myhill-Nerode proof: The infinite set of strings {aⁿbb: n ≥ 0} are pairwise distinguishable for L(P) -- if i ≠ j, take z to be aⁱ and z distinguishes aⁱbb from a^jbb. Pumping Lemma proof: Let p be the pumping length and let w be the string a^pbba^p. Because |xy| ≤ p and |y| > 0, y must be a non-empty string of a's and pumping y up or down yields a string with more or fewer a's on the left and thus not in L(P).
- (d,5) Explain why P is already in the normal form used in the proof that any PDA has an equivalent context-free grammar.
  This normal form had three conditions: (1) the PDA has exactly one final state, (2) it can accept only with an empty stack, and (3) every transition either pushes or pops one character, but not both. P obviously satifies (1) and (3). For (2) note that the first transition must push a b onto the stack, and this b can only be popped by the only transition into state f. So we can enter f only by popping this b and thus leaving an empty stack.
  Many of you mixed up this normal form for PDA's, which was defined only for this particular proof, with other normal forms for grammars or NFA's.
- (e,5) Describe the set of non-terminals in the grammar constructed from P in that proof.
  The construction created a non-terminal A_xy for every pair of elements x and y in the state set {p,q,r,s,f}, with x = y possible. There are thus 25 non-terminals in the grammar, though as we will see most of them are useless.
- (f,5) Of the non-terminals listed in part (e), only some can possibly generate a string of terminals in the constructed grammar. Which ones, and why?
  Any of the non-terminals A_pp, A_qq, A_rr, A_ss, or A_ff can derive the empty string ε, which is a string of (zero) terminals. Each of the others, say A_xy, can derive a string only if P can read that string while going from state x and empty stack to state y and empty stack. From p and empty stack, P cannot empty the stack until it is in f, and from q and empty stack it cannot empty the stack until it is in S. So A_pf and A_qs are the only other useful non-terminals.
  Although you weren't asked for it, the grammar constructed (ignoring the useless non-terminals) has rules A_pf → A_qs, A_qs → aA_qsa, A_qs → bA_rrb, and A_rr → ε. There are also all the rules of the form A_xy → A_xzA_zy, but they are either trivial or involve useless non-terminals in this case.
Question 7 (30): Let D be the DFA with state set {1, 2, 3, 4, 5}, start state 1, final state set {2, 3, 4}, and the following transition function δ: δ(1, a) = 1, δ(1, b) = 2, δ(2, a) = 3, δ(2, b) = 2, δ(3, a) = 4, δ(3, b) = 2, δ(4, a) = 5, δ(4, b) = 2, δ(5, a) = 5, and δ(5, b) = 2.
- (a, 5) Describe the language L(D) in English.
  L(D) is the set of all strings containing at least one b and having at most two a's after the last b. Note that after the first b, every b takes D to state 2, whereupon states 2, 3, 4, and 5 represent 0, 1, 2, and more than 2 a's since the last b respectively.
- (b, 10) Give a regular expression denoting the language L(D). If you do not use the standard construction, give some idea why you believe your construction to be correct.
  From the English description, which I have justified above, the regular expression Σ^*b(ε ∪ a ∪ aa) is correct. For the standard construction, add new start state 0 and final state 6. Killing 1 gives the transition (0, a^*b, 2). Killing 5 gives (4, aa^*b, 2), making (4, b ∪ aa^*b, 2). Killing 4 gives (3, a, 6) which merges into (3, ε ∪ a, 6) and (3, a(b ∪ aa^*b), 2) which merges into (3, b ∪ a(b ∪ aa^*b), 2). Now killing 3 gives (2, a(ε ∪ a), 6) which merges into (2, ε ∪ a ∪ aa) and (2, a(b ∪ a(b ∪ aa^*b)), 2). Finally killing 2 gives (0, a^*b(a(b ∪ a(b ∪ aa^*b))^*(ε ∪ a ∪ aa), 6), from which we can read the final regular expression. This simplifies to a^*b(a^*b)^*(ε ∪ a ∪ aa).
- (c, 5) A death state of a DFA is defined to be a state s that (1) is reachable from the start state, (2) is non-final, and (3) has δ(s, a) = s for every input letter a. Could any valid DFA for L(D) have a death state? Why or why not?
  No valid DFA for L(D) could have a death state, because any string followed by a b is in L(D), and thus any reachable state of any valid DFA for L(D) must have a b-arrow to a final state and could thus not be a death state.
- (d, 5) Prove that no minimal DFA for any language can have more than one death state.
  If d₁ and d₂ were each death states, the minimization algorithm will never separate them and thus they will be merged together at the end of the minimization algorithm, making the DFA smaller and showing that the original DFA was not minimal. More directly, you can just merge d₁ and d₂ into a single non-final state d, keeping the language exactly the same as no possible path to any final state has changed.
- (e, 5) Give necessary and sufficient conditions for a regular language to have a valid DFA with a death state.
  There is a death state if and only if there exists a string x such that for any string y, xy is not in the language. Clearly any string that takes the DFA to a death state has this property. And if such a string x exists, it is Myhill-Nerode equivalent to xz for every possible z, since xz has the same property. So in the minimal DFA for the language, the state reached by x is a death state.

Last modified 5 March 2009