CMPSCI 401: Theory of Computation

First Midterm Exam Solutions, Spring 2010

David Mix Barrington

22 February 2010

Question text is in black, solutions in blue.

Directions:

Answer the problems on the exam pages.
There are seven problems for 125 total points plus 10 extra credit. Actual scale was A=115, C=75.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first four questions are true/false, with five points for the correct boolean answer and up to five for a correct justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. -- note that there is no reason not to guess if you don't know.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 35+10 points
  Q6: 20 points (note the exam paper said "30" here)
  Q7: 30 points

  Total: 125+10 points

Several questions refer to the following PDA called P. P has state set {q₀, q, f}, start state q₀, final state set {f}, input alphabet {a, b}, and stack alphabet {$, c}. It has six transitions: (q₀, ε, ε; q, $), (q, a, ε; q, c), (q, b, ε; q, c), (q, a, c; q, ε), (q, b, c; q, ε), and (q, ε, $; f, ε). (Remember that a transition (p, a, b; q, c) means that the machine may go from state p to state q reading a from the input, popping b, and then pushing c.)

The grammar G is given by the rule set S --> SS | aSa | aSb | bSa | bSb | ε. It was defined on the test paper but was left off of this page until I discovered the error in February 2011. (DAMB)

Question 1 (10): True or false with justification: There exists a string of odd length (with an odd number of letters) in L(P).
FALSE. Every string in P(D) has even length. Because of the $ symbol, P can only accept with an empty stack. Every move from state q either reads and pops or reads and pushes, in each case exactly one letter. The number of pushes must equal the number of pops to get an empty stack, and so the number of reads is twice this number, which is even.
Question 2 (10): True or false with justification: Every string of even length is in L(P).
TRUE. There are several ways to accept an arbitrary even-length string with P. If the length of w is 2k, you could read the first k letters pushing a c each time, then read the last k letters popping a c each time. Or you could push while reading every odd-numbered letter and pop while reading every even-numbered letter.
Question 3 (10): True or false with justification: Let L be a regular language, the language of some DFA with k states. Then if w is any string in L with length at least k, there exist strings x, y, and z such that w = xyz and such that every string in L is of the form xyⁱ for some number i. (Clarification during test: The last phrase means ∀s:(s ∈ L) → ∃i: s = xyⁱz rather than ∃i:∀s: (s ∈ L) → s = xyⁱz.)
FALSE. This statement is somewhat similar to the Regular Language Pumping Lemma but actually says something different and quite ridiculous. The RLPL says that every pumped string xyⁱz is in L, but this says that every string in L is a pumped string. If you read the statement clearly, it is easy to find a counterexample. Let L be Σ^*, with Σ = {a,b}, and let w be the string a^k. However the strings x, y,and z are chosen, all the pumped strings will be in the language a^* and thus b, for example, could not be a pumped string although it is in L.
Question 4 (10): True or false with justification: The languages L(P) and L(G) are equal.
TRUE. The answers to Questions 1 and 2 imply that L(P) is exactly the set of even-length strings, so we just need to prove (not just assert) that L(G) is this same set. Every string produced by G must have even length because every step in a derivation that produces terminals produces exactly two terminals. Conversely, every even-length string can be generated with G -- here is a proof by induction on k where 2k is the length of the string. If k = 0 we use S --> ε. Assume we can generate any string of length 2k from S, and consider any string of length 2k+2 -- write this string as xyu where x and y are letters and u is a string of length k2. For any x and y, S --> xSy is a rule. So we can derive xyu by the sequence S --> SS --> xSyS --> xyS --> xyu, using the IH to go from S to u.
Question 5 (35+10): The grammar G is defined before Question 1 above.
- (a,5) Give a parse tree showing that the string aabbab is in L(G).
  There are several derivations, the easiest of which is probably S --> aSb --> aaSab --> aabSbab --> aabbab. The parse tree has S at the root with children a, S, and B. The child S has children a, S, and a. The granchild S has children b, S, and b, and the great-grandchild S has an only child ε. The leaves, read left to right, form the string aabbab.
  Another derivation, matching my argument for Question 4, would be S --> SS --> aSaS --> aaS --> aaSS --> aabSbS --> aabbS --> aabbaSb --> aabbab.
- (b,5) Show that for any non-negative integer n, the string aⁿbⁿaⁿbⁿ is in L(G).
  From S, apply the rule S --> aSb n times to get aⁿSaⁿ. Then apply the rule S --> bSa n times and the rule S --> ε once to get the desired string.
- (c,10) Is the language {aⁿbⁿaⁿbⁿ: n ≥ 0} a context-free language? Prove your answer.
  It is not a CFL. (I was happy that most of you recognized that although Question 5b shows this language to be a subset of the CFL L(G), this has nothing to do with its own CFL-ness.) To show this language is not CFL we use the CFL Pumping Lemma. If it were a CFL it would obey the CFLPL for some pumping length p. Let w be the string a^pb^pa^pb^p. Since |w| ≥ p, the CFLPL says that w can be written as uvxyz with |vxy| ≤ p, |vy| > 0, and such that for all i, ivⁱxyⁱz is in L(G). We will show that this is impossible. Since either v or y is nonempty, changing i from 1 to 2 adds some a's or b's to the string. Since |vxy| ≤ p, however, this can add a's or b's to at most two of the four blocks of common letters that make up w. Thus uvvxyyz cannot be in L(G), as even if it is in the language a^*b^*a^*b^* the lengths of the four blocks cannot be all equal.
  A few of you misused the CFLPL (or the RLPL) by naming the strings u, v, x, y, and z yourself. You just get to name w -- then you must show that any division of w into five strings, meeting the conditions of the CFLPL, must lead to at least pumped string that is not in L(G).
- (d,10) Describe either a top-down parser or a bottom-up parser for G (and indicate which you are doing).
  Both the TDP and BUP have three states, q₀, q, and f, with start state q₀ and only final state f. The TDP has rules (q₀, ε S$; q. ε), (q, a, a; q, ε), (q, b, b; q, ε), (q, ε S; q, ε), (q, ε S; q, SS), (q, ε S; q, aSa), (q, ε S; q, aSb), (q, ε S; q, bSa), (q, ε S; q, bSb), and (q, ε S; f, ε).
  The BUP has rules (q₀, ε $; q. ε), (q, a, ε; q, a), (q, b, ε; q, b), (q, ε ε; q, S), (q, ε SS; q, S), (q, ε aSa; q, S), (q, ε bSa; q, S), (q, ε aSb; q, S), (q, ε bSb; q, S), and (q, ε S$; f, ε).
- (e,5) Is L(G) a regular language? Prove your answer.
  It is regular -- we showed in Question 4 that it is exactly the set of even-length strings, which has the regular expression ((a ∪ b)(a ∪ b))^* or a two-state DFA where the initial state is the only final state and every letter takes every state to the other state.
- (f,10XC) Let G' be the grammar obtained from G by deleting the two rules S → aSb and S → bSa. Is L(G') a regular language? Prove your answer.
  It is not. Many of you said that L(G') is the set of even-length palindromes, since this is what you generate if you use the rules S --> aSa and S --> bSb any number of times followed by S --> ε. But since S --> SS is still a rule, L(G') is actually the star of the set of even-length palindromes. Since L(G) includes (aa ∪ bb)^* as a subset, it is difficult to construct a Pumping Lemma proof that it is not regular -- pumping an even-length string of a's to a longer even-length string of a's keeps you in L(G').
  My proof is as follows -- I claim that if i > j, the strings x = (ab)ⁱ and y = (ab)^j are not L(G') equivalent. Let z = (ba)^j. It is easy to see that yz is in L(G') because it is an even-length palindrome -- I claim that xz is not in L(G'). If it were in L(G'), I could divide it into substrings, each of which is an even-length palindrome. A nonempty even-length palindrome must have aa or bb as its central two characters, so the only even-length palindromes in xz must center on the only double-letter in it, a bb. So any division into nonempty substrings can contain at most one even-length palindrome and thus cannot consist entirely of even-length palindromes.
  Students of group theory may recognize this language as the set of words in the generators a and b that equal the identity element of the group given by the relations a² = 1 and b² = 1. Or not.
Question 6 (20): Let N₁ and N₂ be any two NFA's with k states each. Remember not to make any assumptions other than that they are NFA's with k states.
- (a,10) Describe an NFA N₃ such that L(N₃) = L(N₁) ∪ L(N₂). How many states does your N₃ have?
  Let N₃ consist of all the states and transitions of N₁ and N₂, plus a new start state and ε-transitions from the new start state to the start states of N₁ and N₂. The final states of N₃ are the final states of N₁ and N₂. A word can be accepted by N₃ if and only if it can be accepted by either N₁ or N₂ or both, since the paths in N₃ from the new start state to a final state correspond either to paths in N₁ or in N₂.
- (b,10) Describe an NFA N₄ such that L(N₄) = L(N₁) ∩ L(N₂). How many states does your N₄ have?
  In lecture we presented the "product construction" for DFA's, taking two DFA's D₁ and D₂ and making a new DFA whose states are pairs (q₁, q₂) where q_i is a state of D₁. The transition function δ of the product machine is defined by the rule δ((q₁, q₂), a) = (δ₁(q₁, a), δ₂(q₂, a)). The final states of the product machine are those states where both components of the pair are final for their respective machines. The language of this machine is L(D₁) ∩ L(D_{2But here we have NFA's, so we must either convert to NFA's or argue that we
  can adapt the construction. Given N₁ and N₂ with k states
  each, we can construct D₁ and D₂, with 2^k
  states each, using the subset construction to ensure that
  L(D₁) = L(N₁) and
  L(D₂) = L(N₂). Then we can use the product construction
  to get a DFA (which of course is also an NFA) with the desired language and
  2^2k states.}
  To adapt the product construction to NFA's, we need to say what we are going to do with ε-moves. If we add an ε-move from each state to itself, and follow Sipser in defining δ(q,a) for an NFA to be the set of states r such that there is a transition from q to r on letter a (where a could also be ε), we can define the set δ((q₁, q₂), a) to be the direct product of the sets &delta₁;(q₁, a) and δ₂(q₂, a). Then it is possible for our product NFA to read w and go from (q₀₁, q₀₂) to (r₁, r₂) if an only if it is possible to go from q₀₁ to r₁ reading w and it is possible to go from q₀₂ to r₂ reading w. Thus if we make the final state set of our product machine the direct product F₁ × F₂, the language of our product machine is L(N₁) ∩ L(N₂) as desired. I gave only partial credit for answers that applied the product construction directly to NFA's without these details.
Question 7 (30): Let M be an NFA with state set {1, 2, 3, 4}, start state 1, final state set {4}, and the transitions (1,b,2), (1,a,3), (1,a,4), (2,b,3), (2,b,4), and (3,b,4). (Remember that the transition (p,a,q), for example, means that the NFA can go from state p to state q while reading an a.) In Sipser's notation for NFA's, we have that δ(1,a) = {3,4}, δ(1,b) = {2}, δ(2,b) = {3,4}, δ(3,b) = {4}, and for all other states p and letters c, δ(p,c) = ∅.
- (a,10) Give a regular expression denoting the language L(M). If you use the GNFA construction, note that M already satisfies the conditions to be a GNFA.
  Applying the GNFA construction, we can kill state 2, creating the new transitions (1,bb,3) and (1,bb,4), which merge with existing transitions to become (1, a ∪ bb, 3) and (1, a ∪ bb, 4). Then we can kill state 3, creating a single merged transition (1, a ∪ bb ∪ (a ∪ bb)b, 4). The final regular expression can also be written "a ∪ ab ∪ bb ∪ bbb (typo corrected 25 Feb)" or "(a ∪ bb)(ε ∪ b)", where we understand "ε" to be an abbreviation for "∅^*". Since the original NFA has no loops and none are created during the construction, we don't generate any new stars in the regular expression.
- (b,10) Using the subset construction or otherwise, find a DFA D such that L(D) = L(M).
  The start state is {1}. We can compute δ({1}, a) = {3,4}, δ({1}, b) = {2}, δ({3,4}, a) = ∅ δ({3,4}, b) = {4}, δ({2}, a) = ∅, δ({2}, b) = {3,4}, and δ({4}, a) = &delta({4}, b) = δ(∅, a) = δ(∅, b) = ∅. Once every state we have reached has both its transitions to other states we have reached, we can stop, in this case with a five-state DFA. Renaming {1} = 1, {2} = 2, {3,4} = 3, {4} = 4, and ∅ = 5, we have start state 1 and final state set {3,4}.
  Several of you constructed the following valid DFA by creating a states for every prefix of a string in L(M), so that the seven states can be called ε (the start state), a, b, ab, bb, bbb, and "death". Here the final state set is {a, ab, bb, bbb}, and the transition from state named w on letter x goes to the state named wx if there is one and to "death" if there is not. This is a natural construction of a DFA for any finite language.
- (c,10) Show that your DFA D is minimal, or otherwise find a DFA for L(M) that has the minimum number of states needed to decide that language.
  My five-state DFA from the first paragraph of my answer to Question 7b is minimal. If we apply the minimization construction to it, we begin with classes N = {1,2,5} and F = {3,4}. The behavior of state 1 with respect to this partition is (F,N), the behavior of state 2 is (N,F), and the behavior of state 5 is (N,N). (Here "behavior of (F,N)" means the state goes to a state in F on an a and to a state in N on a b.) Thus the three states of N must go into three separate classes in the next partition. Similarly, the behavior of state 3 is (N,F) and the behavior of state 4 is (N,N), so these two states must also go into separate classes in the next partition. Our second partition has five classes with one state each, proving that the original DFA was minimal.
  If we apply the minimization construction to the second DFA in my answer to Question 7b, we find the partition {{ε}, {a,bb}, {b}, {ab,bbb}, {death}} to be stable and so get a five-state minimal DFA isomorphic to that above.
  If we want to prove my first DFA to be minimal without the construction, we can note that among the non-final states, 1 and 2 are separated by a, 1 and 5 are separated by a, and 2 and 5 are separated by b. Among the final states, 3 and 4 are separated by b. So no states can be merged and the DFA is minimal.

Last modified 13 February 2011