# Solutions to First Midterm Exam, Spring 2012

### Directions:

• Answer the problems on the exam pages.
• There are thirteen problems for 120 total points plus 10 extra credit. Actual scale was A=112, C=70.
• If you need extra space use the back of a page.
• No books, notes, calculators, or collaboration.
• The first six questions are true/false, with five points for the correct boolean answer and up to five for a correct justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. -- note that there is no reason not to guess if you don't know.

Question text is in black, solutions in blue.

```  Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 10 points
Q6: 10 points
Q7: 10 points
Q8: 10 points
Q9: 10 points
Q10: +10 points
Q11: 10 points
Q12: 10 points
Q13: 10 points

Total: 120+10 points
```

The regular language L1 is over the alphabet Σ = {a, b, c} and is defined by the regular expression Σ*(aa ∪ bb ∪ cc)Σ*.

The language L2 is the complement of L1, that is, Σ* ∖ L1.

The context-free language L3 over the alphabet {a, b} is defined by the grammar with start symbol S and rules S → bSS and S → a.

• Question 1 (10): True or false with justification: The class of context-free languages is closed under reversal. That is, if L is any context-free language, then the language LR = {wR: w ∈ L} is context-free.

TRUE. Let M be a PDA as in the proof of the PDA-to-CFG Theorem. M has exactly one final state (not its start state), accepts only with an empty stack, and either pushes or pops one letter (not both) one each transition. Let M' be a PDA with the same state set, switched start and final states, and each transition reversed: (p, a, c; q, ε) becomes (q, a, ε;p, c) and (p, a, ε; q, c) becomes (q, a, c; p, ε). Each path in M from q0 to f, accepting with empty stack, corresponds to a path in M' from f to q0 that also accepts with empty stack. The string that M' reads is exactly the reversal of the string that M reads.

Another proof is to take a grammar for L and reverse the right-hand sides of every rule. I didn't think that it was obvious that this construction is correct, so I took a point off for assuming that it is. Someone gave a convincing proof that it is -- consider a parse tree showing that w is in L(G), and hold that tree up to a mirror. The result is a parse tree for wR, and the rules it uses are exactly those built by this construction.

• Question 2 (10): True or false with justification: The language L4 = {aibjck: (i ≥ (i + j + k)/117) ∧ (j ≥ (i + j + k)/117) ∧ (k ≥ (i + j + k)/117)} is a context-free language.

FALSE. Use the CFL Pumping Lemma -- let p be the pumping length, let w = apbpcp, and let w = uvxyz. Because |vxy| ≤ p, the strings v and y together include only one or two types of letters, not all three. Because vy ≠ ε, v and y together do include at least one type of letter. Let i = 117p and look at the string uvixyiz. There are still only p of the letter (or letters) that was not affected, and at least 118p of one of the affected letter types, so p is less than 1/117 of the total. Since the language cannot satisfy the CFLPL, it is not context-free.

• Question 3 (10): True or false with justification: Let u and v be any two nonempty strings. Then u and v are L1-equivalent (in the sense of the Myhill-Nerode Theorem) if and only if u and v have the same last letter.

FALSE: Let u = aa, v = bb. Given any string z, both uz and vz are in the L1, so u and v are L1-equivalent. But they have different last letters. If neither u nor v is in L1, the statement is true.

• Question 4 (10): True or false with justification: The language L3 has exactly four strings of length 7.

FALSE: There are exactly five such strings: bababaa, babbaaa, bbaabaa, bbabaaa, and bbbaaaa, as is easy to check by exhaustive search of derivations. This language is similar to the balanced parenthesis language Dyck1, for which the number of strings of length 2k is the Catalan number Ck.

• Question 5 (10): True or false with justification: Let u, v, x, y, and z be any five strings over the same (nonempty) alphabet. Then the language {uvixyiz: i ≥ 0} is a context-free language.

TRUE: It is the language of the grammar S → uTz, T → vTy, T → x. Any derivation in this grammar must use the first rule, then the second rule i times, then the last rule, generating uvixyiz.

• Question 6 (10): True or false with justification: Let N be an NFA, with no ε-moves, all of whose states are reachable from the start state. Let D be the equivalent DFA constructed from N by the Subset Construction, containing only states reachable from its start state. Then D must have at least as many states as N.

FALSE: My four-state NFA example has nonfinal start state 1, with a-transitions to final states 2, 3, and 4, and no other transitions. The equivalent DFA has three states, {1}, {2,3,4}, and ∅.

Questions 7-10 use the languages L1 and L2 defined above.

• Question 7 (10): Build and NFA whose language is L1. Justify your answer. (Note: The construction from Sipser gave 22 states (by my count) for this regular expression. You are welcome to use it, or my simpler construction, but you may be able to create and justify a simpler NFA.)

The simplest NFA has five states -- nonfinal start state 1 with a Σ loop, three nonfinal intermediate states 2, 3, and 4 each with a letter-arrow in from the start and an arrow with the same letter to the final state, and a final state 5 with a Σ-loop. This is correct because an accepting path may read any string staying at the start, then read aa, bb, or cc, then read any string staying at the final state.

My construction, followed slavishly, would give 11 states because the start and final states above would each be replaced by four-state machines for Σ*. The book's construction has 22 because of all the extra ε-moves.

You could also build and justify the DFA as an answer to this question, since DFA's are also NFA's.

• Question 8 (10): Build a DFA for the language L2 (not for L1) by any valid method.

A DFA for L1 is pretty easy to build and generate directly, but let's carry out the Subset Construction. Start state {1} has an a-arrow to {1, 2}, a b-arrow to {1, 3}, and a c-arrow to {1, 4}. State {1, 2} has an a-arrow to {1, 2, 5}, a b-arrow to {1, 3}, and a c-arrow to {1, 4}. States {1, 3} has an a-arrow to {1, 2}, a b-arrow to {1, 3, 5}, and a c-arrow to {1, 4}. State {1, 4} has an a-arrow to {1, 2}, a b-arrow to {1, 3}, and a c-arrow to {1, 4, 5}. The three final states each have a-arrows to {1, 2, 5}, b-arrows to {1, 3, 5}, and c-arrows to {1, 4, 5}. This DFA has seven states -- one start, three intermediate, and three final. It remembers whether it has seen a double letter yet, and what the last letter (if any) was.

Of course we get a DFA for L2 by switching the final and nonfinal states of this DFA.

• Question 9 (10): Give a DFA for L2 with the minimum possible number of states, and prove that it is minimal. (You could do this by running the minimization algorithm, or by showing that each pair of states is distinguishable.)

We divide the DFA from Question 8 (for L2) into classes F and N. All three states in N go to N on each letter, so there is no reason to separate them. The four states in F each have a different behavior on inputs (a, b, c) -- {1} has (F, F, F), {1, 2} has (N, F, F), {1, 3} has (F, N, F), and {1, 4} has (F, F, N). So F must be split into four classes, and we have a final DFA with five states (with the three nonfinal states merged to one). Since we used the minimization algorithm, this DFA is minimal.

It's also straightforward to give this five-state DFA and show that each pair of final states is distinguishable.

• Question 10 (10 extra credit): Give a regular expression for the language L2. (Note -- this is extra credit because it is likely to take some time, so use your best judgement as to whether to try this before attempting the other problems. You could use the book's algorithm to get the regular expression from the DFA. or you could reason directly. Here's a hint if you try the latter -- if w is a string in L2, what can happen between the occurrences of the letter a in w?)

The clever method first -- the strings in L2 with no a's are ε ∪ b(cb)*(c ∪ ε) ∪ c(bc)*(b ∪ ε). This is because a string of b's and c's with no double letter must alternate b's and c's.

Define X to be the regular expression b(cb)*(c ∪ ε) ∪ c(bc)*(b ∪ ε). These are the nonempty strings in L2 that have no a's. Then since every two a's must have a string in L(X) between them, we get (ε ∪ X)(aX)*(a ∪ ε). Substituting for X gives the whole regular expression.

I worked out an answer by state elimination as well and got ε ∪ c ∪ Y ∪ (a ∪ ca ∪ Y)(ca ∪ Y)*(ε ∪ c ∪ Y), where Y is the regular expression (b ∪ ab)(cb)*(a ∪ ca).

Questions 11-13 deal with the context-free language L3 defined above, with the grammar rules S → bSS and S → a.

• Question 11 (10): Give a pushdown automaton whose language is L3, by any valid method.

Using the top-down parser, there are three states q0, p, and f. There is a transition from q0 to p that reads nothing and pushes S\$ so that the S is the bottom of the stack. (I didn't insist that you break this into two transitions, each pushing a letter.) There are four loops on state p, one reading and popping an a, one reading and popping a b, one popping S and pushing a, and one popping S and pushing bSS. (Officially this last one should be broken into three transitions each pushing a letter.) Then there is one transition from p to f that reads nothing and pops \$. Since this PDA is made by a known construction from the CFG, we know that it is correct for the language of the CFG.

• Question 12 (10): Is L3 a regular language? Prove your answer. (Hint: Consider the intersection of L3 with the regular language b*a*.)

L3 is NOT a regular language. Its intersection with b*a* is the language {biai+1: i ≥ 0}, but we have to justify this claim. (We can generate any string of this type by using the S → bSS rule i times and using the S → a rule on the remaining S's. To be in this regular language, a string derived in the grammar must use the S → a rule to the right of any a's, so the only decision is how many times to use the other rule to the left of any a's, and if this number is i then we generate exactly the string biai+1.)

This language is easily seen to be non-regular by either the Myhill-Nerode method or the Regular Language Pumping Lemma. For the former, note that {bi: i ≥ 0} is an infinite set of distinguishable strings because bi and bi are distinguished by ai+1.

• Question 13 (10): Prove carefully that every string in L3 has odd length. You should use mathematical induction. One way is to use induction on the number of steps in the derivation of a string in L3. Another is to use the grammar as an inductive definition of the language and use induction on that.

By induction on k, the number of steps in the derivation -- the base case is k = 1 and the only complete derivation of one step is for the string "a", which has odd length.

For the inductive step, assume that every derivation of k or fewer steps produces a string of odd length, and consider any derivation of length k+1. The first step must take S to bSS, assuming k > 0, and the rest of the derivation produces strings u and v from the two S's. Since these derivations use k or fewer steps, they produce odd-length strings by the IH. So the string buv that we derive has length (1 + |u| + |v|) which is the sum of three odd numbers and therefore is odd.

I gave full credit for a coherent invariant argument, arguing that every string of a's, b's, and S's in a derivation has odd length because the rules either keep the length the same or increase it by 2, and the original length of "S" is 1.