# Solutions to First Midterm Exam

#### 1 March 2008

Question text is in black, solutions in blue.

```  Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 30 points
Q6: 50 points
Total: 120 points
```

• Question 1 (10): (True/False with justification) Let M be a PDA with the property that it never, when reading any string, can put more than three total characters on its stack. Then L(M) is a regular language.

TRUE. We can simulate this PDA with an NFA. There are only a finite number of possible stack configurations: if the size of the stack alphabet Γ is k, there are k3 + k2 + k + 1. So our NFA can have a state for each pair consisting of a state of M and a stack configuration of M. Since L(M) is the language of an NFA, it is regular.

• Question 2 (10): (True/False with justification) Let X by a CFL whose complement X-bar is also a CFL. Then X must be a regular language.

FALSE: Let X be the language {anbn: n ≥ 0}, which we know to be a CFL that is not regular. The complement of X is a CFL, though we must justify this claim. X-bar is the union of three languages: (1) the complement of the regular language a*b*, which is a CFL because it is regular, (2) {aibj: i > j}, which is a CFL -- we can design a grammar with rules S → aS, S → aSb, and S → a, for example, and (3) {aibj: i < j}, which is also a CFL by a very similar argument to (2). Since X-bar is the union of three CFL's, it is a CFL, and X satisfies the conditions of the statement but is not regular.

• Question 3 (10): (True/False with justification) Let N be an NFA with k states that has no ε-moves and never has more than one move from any state with the same letter label. (That is, N can be obtained from some DFA solely by deleting moves.) Then there exists a DFA for L(N) with exactly k+1 states.

TRUE. Apply the Subset Construction to N. The start state will be {q0}, where q0 is the start state of N. Each additional DFA state generated in the subset construction can have no more than one element, since N has no multiple choices or ε-moves. We might or might not generate the "death state" for the empty set, and every other DFA state must be a singleton set ({s}, where s is a state of N}, so there are only k+1 possible states. If the construction gives us fewer than k+1 states, we may add unreachable states to get exactly k+1.

Another, perhaps better way to put this: Define D to have a state for each state of N, plus one more state called d, the "death state". D's transition function follows the arrows of N wherever possible, and otherwise goes to d. Of course d's moves are all to itself. It is obvious that L(D) = L(M), and D has exactly k+1 states.

• Question 4 (10): (True/False with justification) Let Σ = {0,1} and let f be the function from Σ* to Σ* that erases all 0's, so that for example f(01001) = 11. Let Y be any regular language over Σ. Let Z be the set of all strings that can be made by applying f to some string in Y -- formally, Z = {z: ∃y: (y ∈ Y) ∧ f(y) = z}. Then Z must be a regular language.

TRUE. The argument I like best is to take a regular expression for Y (which must exist because Y is regular) and replace all 0's in it with ε's (technically, with ∅*'s). It's obvious to me that this new regular expression generates Z.

If we start with a DFA or NFA for Y, we can create an NFA for Z by replacing any 0-moves in the Y-machine with ε-moves. I don't think it's obvious that this gives a machine for Z, but it's easy to justify this claim. If a string w is accepted by the alleged Z-machine, there is a path of 1-moves and ε-moves in the Z-machine from the start state to the final state. The corresponding path of 1-moves and 0-moves in the Y-machine accepts some string u in Y, and clearly f(u) = w, so w is in Z. Conversely, if u is any string in Y, we can take its accepting path in the Y-machine and map it directly to a path in the new machine that accepts f(u), so f(u) is accepted.

Many people incorrectly claimed that Z must be regular because it must be the language 1*. Z must be contained in 1* because every letter in every string of Z is a 1, but the exact nature of Z depends on Y -- there is no reason that every string in 1* is the image under f of some string in Y. For example, let Y be (110)* -- then Z is (11)*, not 1*.

• Question 5 (30): Let N be the NFA with state set {1,2,3}, alphabet {a,b,c}, start state 1, final state set {2}, and the following transitions: (1,b,2), (1,b,3), (2,a,1), (2,a,3), (3,c,1), and (3,c,2). (In Sipser's notation, δ(1,b) = {2,3}, δ(2,a) = {1,3}, δ(3,c) = {1,2}, and all other values of δ are ∅. The diagram has six arrows: b-moves from 1 to 2 and 1 to 3, etc.)

• (a,15) Give a regular expression for the language L(N).

We carry out the standard state-elimination construction. We first form a GNFA by adding a new start state 0 and a new final state 4 to states 1, 2, and 3, adding ε-moves from 0 to 1 and from 2 to 4.

We next eliminate state 3, which creates four new moves because 3 has two moves coming into it and two going out of it. We add a bc-loop at 1 and an ac-loop at 2. The move from 1 to 2 is now labeled b+bc, and the move from 2 to 1 is now labeled a+ac.

We next eliminate state 2, which creates two new moves because 2 has one move in and two moves out. The new three-state GNFA has an ε-move from 0 to 1, a loop at 1 labeled bc + (b+bc)(ac)*(a+ac), and a move from 1 to 4 labeled (b+bc)(ac)*.

When we finally eliminate state 1, the single transition in the new GNFA goes from 0 to 4 and is labeled [bc + (b+bc)(ac)*(a+ac)]*(b+bc)(AC)*. There are other possible equivalent regular expressions, of course, and eliminating the states in a different order will yield one of them if done correctly.

• (b,15) By the subset construction or otherwise, find a DFA D such that L(D) = L(N). Then either run the state-minimization algorithm on D, or otherwise find a DFA for L(N) with the smallest possible number of states. (If you do not use the state-minimization algorithm, you must explain why your final DFA is minimal.)

The start state of the DFA is {1}, which I will call "1". On a or c from 1 we move to the state for the empty set (the nonfinal "death state") which I will call "0". On a b from 1 we go to {2,3}, which I will call "23". Of course 0 has moves on a, b, and c to itself.

State 23 (a final state) has an a-move to 13 (a nonfinal state), a b-move to 0, and a c-move to 12 (a final state). State 13 has an a-move to 0, a b-move to 23, and a c-move to 12. State 12 has an a-move to 13, a b-move to 23, and a c-move to 0.

We thus close the process and have a complete DFA with five states, the nonfinal start state 1, final states 12 and 23, and nonfinal states 0 and 13. It turns out that this DFA is minimal for its language. We can prove this by running the minimization algorithm as follows.

We first look at a partition with class F = {12, 23} and N = {0, 1, 13}. State 12 goes to N on a, F on b, and N on c, while state 23 goes to N on a, N on b, and F on c. So we must separate the two states of F at the next stage. Turning to the three states of N, we see that 0 goes to N on all three letters, 1 goes to N on a, F on b, and N on c, and 13 goes to N on a, F on b, and F on c. The three states have three distinct behaviors, so they must be put in three different classes at the next stage. Since the next stage has each state in its own class, it is the last stage and we have shown the DFA to be minimal.

We could make this argument more succinctly by noting that the input b separates 12 from 23, the input b separates 0 from both 1 and 13, and the input c separates 1 from 13. So no two final states can be merged, and no two nonfinal states can be merged, and therefore the DFA is minimal.

• Question 6 (50): These questions all deal with the grammar G that has non-terminal set {S}, terminal set (alphabet) {a,b,c,d}, start symbol S, and rules S → SS, S → aSb, S → cSd, and S → ε.

• (a,10) List all the four-letter strings in L(G) -- there are exactly eight of them. Then give derivations for at least two of these strings showing that they are in L(G).

We can derive abab by the sequence of moves S → SS → aSbS → abS → abaSb → abab. By very similar derivations we can make abcd, cdab, or cdcd.

We can derive aabb by the sequence of moves S → aSb → aaSbb → aabb. By very similar derivations we can make acdb, cabd, and ccdd.

• (b,10) State the Pumping Lemma for context-free languages. If p is the constant in the Pumping Lemma for the language L(G), show that the string apbpcpdd satisfies the conclusion of the Lemma. (Note: This problem makes no reference to any language not being a CFL. Of course L(G) is a CFL, because it is the language of a context-free grammar, G.)

The Context-Free Pumping Lemma says that if X is any context-free language, there exists a positive integer p such that for any string w in X with |w| ≥ p, w can be written as the concatenation of five strings u, v, x, y, z such that |vxy| ≤ p, |vy| > 0, and for all non-negative integers i, the string uvixyiz is in X.

If w is apbpcpdp, we can choose u = ap-1, v = a, x = ε, y = b, and z = bp-1cpdp. Then for any i, uvixyiz is ap-1+ibp-1+icpdp. This is in L(G) because we can change S to SS, derive the a's and b's from the first S, and derive the c's and d's from the second S.

• (c,15) Prove that L(G) is not a regular language. You may use either the Pumping Lemma for regular languages, or the method based on the Myhill-Nerode Theorem (finding an infinite set of pairwise L(G)-distinguishable strings). Please do not use the closure properties of regular languages -- I would like to see you use one of these two methods.

Pumping Lemma proof: If L(G) were regular, the conclusion of the Regular Language Pumping Lemma would hold for some constant p. Let w be the string apbp. The Lemma tells us that w can be written as xyz, with |xy| p, so we know that y consists only of one or more a's. We are told that xyiz is in L(G), but taking i=0 gives us a string xz which is in a*b* but not in L(G) because it has fewer a's than b's. So the conclusion of the Lemma fails for any valid choice of x, y, and z, and thus the supposition that L(G) was regular must be false. (Typo corrected 3 March 2009.)

Myhill-Nerode Proof: We claim that the infinite set of strings {ai: i ≥ 0} is a pairwise L(G)-distinguishable set. Let x = ai and y = aj, with i ≠ j, be any two distinct members of this set. These two strings are L(G) distinguishable because if we take z to be the string bi, we find that xz is in L(G) and yz is not. Since there are infinitely many distinct L(G)-equivalence classes, L(G) cannot be regular.

Proof using closure properties, which I ruled out because it was too easy: If L(G) were regular, its intersection with any regular language would be regular. But its intersection with a*b* is the language {anbn: n ≥ 0}, because we can see that any string in L(G) has an equal number of a's and b's, and we know that this language is not regular.

• (d,15) Describe a PDA whose language is L(G), by constructing the top-down or bottom-up parser or otherwise. Your description may be informal, as long as it is clear what your PDA may or may not do in any circumstance. If you use a standard construction from G, you need not prove that your PDA's language is L(G).

The easiest solution is to construct the top-down parser for G, because you then do not need to prove correctness. This PDA has states s, q, and f, start state s, only final state f, and transitions (s,ε, ε;q,S\$), (q,ε,S;q,ε), (q,ε,S;q,SS), (q,ε,S;q,aSb), (q,ε,S;q,cSd), (q,a,a;q,ε), (q,b,b;q,ε), (q,c,c;q,ε), (q,d,d;q,ε), and (q,ε,\$;f,ε).

A simpler PDA also has L(G) as its language, although it takes an argument to show that this is so. The second PDA M also has states s, q, and f with start state s and only final state f. It may push a \$ going from s to q and pop the \$ going from q to f, and all its other transitions are from q to q. They are (q,a,ε;q,a), (q,b,a;q,ε), (q,c,ε;q,c), and (q,d,c;q,ε). How do we show that L(M) = L(G)? First we show by induction on all strings derivable from S in L(G) that they can be read during a run of M that starts in state 2 with empty stack and finishes in state 2 with empty stack. These strings, which constitute exactly L(G), are thus all in L(M) because we can push the \$, carry out this run, and pop the \$. For the other direction, we observe that any accepting run of M must contain such a run from state 2 and empty stack to state 2 and empty stack when we ignore its first and last moves. And we can show that the string read during any such run is derivable from S in G, by induction as in the proof that our PDA constructed from an arbitrary CFG is correct. Any such run either empties its stack in the middle, in which case it is the concatenation of two shorter such runs, or it pushes a stack character on its first move and pops the same character on its last move. In this latter case we can derive its string using the rule S → aSb or S → cSd, together with the derivation of the string read between the first and last moves.