CMPSCI 501: Theory of Computation

Solutions to First Midterm Exam, Spring 2015

David Mix Barrington

18 February 2015

Question text is in black, solutions in blue.

Directions:

Answer the problems on the exam pages.
There are twelve problems (some with multiple parts) for 120 total points plus 10 extra credit. Actual scale was A = 100, C = 60.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first five questions are statements -- in each case say whether the statement is true or false and give a convincing justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. You get five points for the correct boolean answer (so there is no reason not to guess if you don't know) and up to five for the justification.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 15 points
  Q7: 10 points
  Q8: 10 points
  Q9: 15 points
  Q10: 10 points
  Q11: 10 points
  Q12: +10 points
 Total: 120+10 points

The language X over the alphabet {a, b, c} is the set {a¹b^jc^k: i + j = k}.

The language Y over the alphabet {a, b} is the set {aⁿbⁿcⁿ: n ≥ 0}.

The language Z over the alphabet {a, b, c, d} is the language of the following NFA N. N has state set {0, 1, 2, 3}, start state and only final state 0, and transitions (0, a, 1), (0, a, 2), (1, b, 2), (1, b, 3), (2, c, 3), (2, c, 0), (3, d, 0), and (3, d, 1).

The PDA M has state set {i, p, f} with start state i and only final state f. Its input alphabet is {a, b} and its stack alphabet is {a, c}. Its transitions are (i, a, ε; p, c), (p, a, ε; p, a), (p, b, a; p, ε), and (p, b, c; f, ε). Recall that the transition (q, x, y; r, z) means that the PDA can do from state q to state r while reading x, popping y, and pushing z.

The grammar G has rules S → aTb, T → TT, T → aTb, and T → ε.

Question 1 (10): True or false with justification: The language X, defined above, is regular.
FALSE. Proof 1: The intersection of X with the regular language a^*c^* is {aⁿcⁿ: n ≥ 0}, which is isomorphic to {aⁿbⁿ: n ≥ 0}, a language we proved not to be regular.
Proof 2: For any naturals i and j with i ≠ j, the strings aⁱ and a^j are X-distinguishable, using the string cⁱ. Hence there are infinitely many X-equivalence classes and no DFA for X can exist. (Along the same lines you could quote a correct answer for Question 12 here.)
Proof 3: Use the Regular Language Pumping Lemma with k the alleged pumping length and w = a^kc^k. If X were regular with pumping length k, w could be written as xyz with |xy| ≤ k and |y| > 0, so that for any i the string xyⁱz would be in X. But xz would not be in X, since it would have fewer a's than c's. Hence X is not regular.
Question 2 (10): True or false with justification: The complement of the language X, defined above, is context-free.
TRUE. The complement of X is the union of the complement of a^*b^*c^* and the language {aⁱb^jc^k: i + j ≠ k}. The former langauge is the complement of a regular language, and is thus both regular and context-free. It suffices to prove that the latter language is context-free.
Proof 1: The language in question is the union of two languages, one containing the strings with too few c's and one the strings with too many. X itself has the grammar S → aSc, S → T, T → bTc, T → ε. The language of strings with two many c's is the concatenation of X with the regular language cc^* and is thus context-free. The language of strings with too few c's has the grammar S → aSc, S → aT, S → bU, T → aT, T → U, U → bUc, U → bU, U → ε.
Proof 2: We can easily build a PDA for X that first pushes a $ onto the stack, then reads a's and pushes them onto the stack, then reads b's and pushes an a onto the stack for each one, then reads c's and pops an a from the stack for each one, then pops the $ and accepts if it is at the end of the string. Some of you wanted to make a PDA for the complement by switching final and non-final states of the PDA for X, which does not work. But we can make a PDA for the complement that is similar to the one for X, except that (1) if it sees an a after the first b, or an a or b after the first c, or a c when it can pop the $ instead of an a, it goes to a state where it reads the rest of the input and then accepts, (2) all the former nonfinal states are now final, and (3) the former final state is now nonfinal.
Question 3 (10): True or false with justification: The language Y, defined above, is context-free.
FALSE. Proof 1: Without using the CFL Pumping Lemma, we can see that this language is similar to our given non-CFL {aⁿbⁿcⁿ: n ≥ 0}. If we had a PDA for Y, we could alter it so that after it has seen a b, it interprets any c's it sees as a's and rejects if it sees any real a's. Thus it interprets the input string as being in Y, and thus accepts it, if and only if it is in the given non-CFL.
Proof 2: We can use the CFLPL almost identically to the case of the given non-CFL. Let k be the alleged pumping length and choose w to be the string a^kb^ka^k, which is in Y. If Y were a CFL with pumping length k, w could be written as uvxyz with |vxy| ≤ k, |vy| > 0, and uvⁱxyⁱz in Y for all i. But the string uxz (with i = 0) must have its three single-letter strings of different lengths, since deleting v and y must affect at least one of them but cannot affect all three.
Question 4 (10): True or false with justification: There is a DFA that accepts every string in X. (That is, ∃D: ∀w: (w ∈ X) → (w ∈ L(D)).) (The language X is defined above.)
The simplest such DFA has a single state which is final, with all transitions going from that state to itself. That DFA accepts all strings and hence accepts all strings in X. Someone gave a DFA for the language a^*b^*c^*, which also accepts all strings in X and is thus correct. Of course by Question 1 it is impossible to have a DFA that accepts exactly the strings in X, but we are not asked for such a DFA.
Question 5 (10): True or false with justification: Let Z' be the set of strings in Z that never have the same letter twice in succession. (The language Z is defined above.) Then there is a regular expression whose language is Z.
Since N has no ε-moves, we can easily inspect it and see that it has no two-step paths where both edges have the same label. Hence the language Z' is equal to the language Z, and since Z has an NFA it has a regular expression by Kleene's Theorem.
Several people did not notice this fact about Z and gave the following valid proof, which works for an arbitrary regular language Z. The set of all strings with no double letters is the complement of the regular language Σ^*(aa ∪ bb ∪ cc ∪ dd)Σ^* and is thus regular. (It's also easy to design a DFA for this language, which remembers the last letter it has seen and goes to a death state if it is repeated.) Therefore Z' is the intersection of two regular languages, which we have shown to be regular, and by Kleene's Theorem it has a regular expression.
Question 6 (15): Construct a DFA whose language is Z. Also find such a DFA with a minimal number of states, either by using the minimization algorithm on your first DFA or by arguing directly that your DFA is minimal.
Start state 0 (final) has a-arrow to state 12 (nonfinal) and other arrows to state d (nonfinal). State 12 has b-arrow to state 23 (nonfinal), c-arrow to state 03 (final), and other arrows to d. State 23 has c-arrow to 03, d-arrow to state 01 (final), and other arrows to d. State 01 has a arrow to 12, b-arrow to 23, and other arrows to d. We have completed the construction with six states of the possible 16.
This DFA is minimal, which is easiest to see by proving the three final states, and the three nonfinal states, to be Z-distinguishable. The string bc separates state 0 from state 01, and the string d separates both states 0 and 01 from state 03. The string d also separates 12 from 23, and the string c separates both 12 and 23 from d.
We could also just run the minimization algorithm on the DFA. The initial partition has classes N = {12, 23, d} and F = {0, 01, 03}. If we describe behavior of each state by the classes to which the letters a, b, c, and d go from that state, we get that 12 has NNFN, 23 has NNFF, d has NNNN, 0 has NNNN, 01 has NNNN, and 03 has NNNF. Thus class N is split into three singleton classes and class F is split into two classes, the non-singleton one being {0, 01}. But since b sends 0 to d and sends 01 to 23, and d and 23 are now separate, we get all singleton classes at the next (and last) stage of the algorithm.
Question 7 (10): Construct a regular expression whose langauge is Z.
There are lots of ways to do this, depending on whether we start from N or from the DFA in Question 6, and on what order we remove states. I started by adding a new start and final state to N. Removing state 3 then gives us transitions (i, ε, 0), (0, ε, f), (0, a, 1), (0, a, 2), (1, bd, 0), (1, bd, 1), (1, b, 2), (2, c ∪ cd, 0), and (2, cd, 1).
Removing state 2 then gives (i, ε, 0), (0, ε, f), (0, a(c ∪ cd), 0), (0, a ∪ acd, 1), (1, bd ∪ b(c ∪ cd), 0), (1, bd ∪ bcd, 1).
Removing state 1 then gives (i, ε, 0), (0, ε, f), (0, a(c ∪ cd) ∪ (a ∪ acd)(bd ∪ bcd)^* (bd ∪ b(c ∪ cd)), 0).
The final regular expression is thus [ac ∪ acd ∪ (a ∪ acd)(bd ∪ bcd)^* (bd ∪ bc ∪ bcd)]^*.
Question 8 (10): Construct a PDA equivalent to G (the grammar given above), using the construction given in Sipser and in lecture.
The PDA given by the construction has three states, plus more used solely to implement multiple-letter pushes. For convenience we will describe it with these multiple-letter pushes. The state set is {i, p, f}, the start state is i, the only final state is f, the input alphabet is {a, b}, the stack alphabet is {$, a, b, S, T}, and the transitions are (i, ε, ε; p, S$), (p, a, a; p, ε), (p, b, b; p, ε), (p, ε, S; p, aSb), (p, ε, T; p, TT), (p, ε, T; p, aTb), (p, ε, T; p, ε), and (p, ε, $; f, ε).
Question 9 (15): This question concerns the construction of a CFG equivalent to the PDA M (given above), using the construction given in Sipser and in lecture.
- (a, 5) Explain why M is already in the normal form required by the construction.
  The PDA M (1) has exactly one final state which is not the start state, (2) can accept only with an empty stack, and (3) either pushes or pops one letter on each transition, but never does both. Most people forgot condition (2).
- (b, 10) Describe the CFG resulting from the construction, identifying the relatively few rules that are needed to actually derive strings in the language, but including all the rules and nonterminals in your description.
  There are nine nonterminals A_ii, A_ip, A_if (the start symbol), A_pi, A_pp, A_pf, A_fi, A_fp, and A_ff. It turns out that only A_if and A_pp are needed to derive all possible strings in the language.
  There are three rules taking A_ii, A_pp, and A_ff each to ε. Of these we will need only A_pp → ε.
  There are 27 rules of the form A_xy → A_xzA_zy for each choice of states x, y, and z. Of these we will need only A_pp → A_ppA_pp.
  Finally there are two rules arising from matched pairs of transitions pushing and popping the same letter. These are A_if → aA_ppb from pushing and popping a c, and A_pp → aA_ppb from pushing and popping an a.
Question 10 (10): Argue informally but convincingly that L(M) = L(G), where M and G are given above (at the start of the exam).
In Question 9b we showed that L(M) has a grammar which has a copy of G within it, if we identify A_if with S and A_pp with T. So L(G) ⊆ L(M), though we cannot immediately rule out the possibility that L(M) contains strings not in L(G).
To show that L(M) ⊆ L(G), we must examine all possible accepting computations of M. Any such computation begins by pushing a c and reading an a, and ends by popping that c and reading a b. So L(M) = aQb, where Q is the set of strings that can be read while going from state p on an empty stack to state p on an empty stack. This language is the balanced-paren language, which has a grammar with start symbol T and rules T → TT, T → aTb, and T → ε. Any derivation in this language can thus be mimicked in G.
We might also characterize L(M) and L(G) each as the set of strings of a's and b's that start with an a, end with a b, have an equal number of a's and b's, and always have more a's than b's in any nonempty prefix. Then we can argue separately that L(M) and L(G) are each equal to this language.
Question 11 (10): Prove that if u is any string in the language Z (given above), then there is a string v such that the string uv is in Z and the length of uv is divisible by 3.
Any string u in Z has a path in N from state 0 to itself. Thus if any string v is also in Z, we know that uv is in Z by concatenating the two paths. (I took off two points for not explaining that Z is closed under concatenation. One person misquoted the true fact "The class of regular languages is closed under concatenation" as the false assertion "Every regular language is closed under concatenation.")
The length of u must be congruent to either 0, 1, or 2 modulo 3. If it is congruent to 0, we may take v to be the empty string. If it is congruent to 1, we may take v to be the string ac, and then |uv| is congruent to 0. If |u| is congruent to 2, we may take v = abcd and then |uv| is congruent to 0.
Question 12 (10XC): Describe each of the Myhill-Nerode equivalence classes for the language X.
For any natural i, let A_i be the set {aⁱ}.
For any positive natural j, let B_j be the set {aⁱb^j-i: 0 ≤ i < j}.
For any natural k, let C_k be the set {aⁱb^jc^i+j-k: i and j are naturals with i + j > k}.
Let D be the set of strings that are either not in a^*b^*c^* or are of the form aⁱb^jc^k with i + j < k.
Strings in A_i may be followed by any string of the form a^i'b^jc^k with i + i' + j = k.
Strings in B_j may be followed by any string of the form b^j'c^k with j + j' = k.
Strings in C_k may be followed only by the string c^k.
Strings in D cannot be followed by any strings.
This describes the infinite set of Myhill-Nerode classes for X. We can think of these classes as forming the states of an infinite "minimal automaton" for X, with A₀ as the start state and C₀ as the only final state. The class of a string wa, wb, or wc depends only on the class of w.

Last modified 22 February 2015