# Solutions to First Midterm Exam, Spring 2013

### Directions:

• Answer the problems on the exam pages.
• There are nine problems (some with multiple parts) for 125 total points plus 10 extra credit. Actual scale was A = 115, C = 75.
• If you need extra space use the back of a page.
• No books, notes, calculators, or collaboration.
• The first five questions are statements -- in each case say whether the statement is true or false and give a convincing justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. You get five points for the correct boolean answer (so there is no reason not to guess if you don't know) and up to five for the justification.

Question text is in black, solutions in blue.

```  Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 10 points
Q6: 10 points
Q7: 10 points
Q8: 15 points
Q8: 40+10 points
Total: 125+10 points
```

The language ABC over the alphabet Σ = {a, b, c} is defined as the set of all strings that contain at least one a, at least one b, and at least one c.

If X and Y are any two languages over the same alphabet, the symmetric difference X Δ Y is defined to be the set of strings that are in either X or Y, but not in both.

• Question 1 (10): True or false with justification: Let L be a language over some nonempty alphabet Σ such that there exists a positive integer k such that for every string w in L, w has at most k letters. (In symbols, ∃k: ∀w:(w ∈ L) → (|w| ≤ k).) Then L must be a regular language.

TRUE. There can be at most finitely many strings of length ≤ k -- the exact number is the sum for i from 0 to k of |Σ|i. We can write a regular expression for the singleton set containing each string in L, then union these together to get a regular expression for L.

• Question 2 (10): True or false with justification: Let L be a language over some nonempty alphabet Σ such that for every string w in L, there exists a positive integer k such that w has at most k letters. (In symbols, ∀w: (w ∈ L) → ( ∃k: |w| ≤ k).) Then L must be a regular language.

FALSE. Any L has this property, since any string has a finite length. There exist non-regular languages, so there exist non=regular languages with this property.

• Question 3 (10): True or false with justification: The language ABC (defined above) is the language of some DFA with exactly six states.

FALSE. Let S be the set {ε, a, b, c, ab, ac, bc, abc}. If u and v are any two strings in S, there exists a letter in u but not in v or vice versa. WLOG let there exist a letter x, in u but not in v. Let z be a string containing exactly those letters not in u. Then uz is in L, but vz is not in L because it has no x. Since u and v were arbitrary, we see that S is a set of pairwise L-distinguishable strings. By Myhill-Nerode, any DFA for L must have at least eight states.

• Question 4 (10): True or false with justification: The language {w: ∃k: ∃y: (k ≥ 1) ∧ (w = 1k0y) ∧ (y has at most k ones)} is context-free. (Here the alphabet is {0, 1}.)

TRUE. The grammar S --> S0 | 1S1 | 0 | 1S generates this language. Given any string 1k0y in the language, I can make it by first using the rules S --> S0 and S --> 1S1 to make y to the right of the S, making a 1 to the left for every 1 in y. Then if I need more 1's to the left, I make them with the rule S --> 1S. Finally I make the 0.

I must also show that every string made by my grammar is in this language. I must finish by using S --> 0, after using the other three rules some number of times. So I have a 0, and I made some 1's to the left of it and a string of 0's and 1's to the right of it. If m is the number of times I used the rule S --> 1S1, the number of 1's to the left of the first 0 in the final string is at least m, and the number of 1's to the right of it is at most m.

• Question 5 (10): True or false with justification: If X and Y are regular languages, then X Δ Y is also regular. (The symmetric difference operator is defined above.)

TRUE. Here are two proofs:

(1) X Δ Y equals C(X ∪ C(Y)) ∪ C(C(X) ∪ Y), where C is the complement operator, and we know that the regular languages are closed under union and complement.

(2) Let X = L(M) and Y = L(N) where M and N are DFA's. Construct a DFA O whose states are pairs (m, n) with m a state of M and n a state of N. The start state of O is the pair of start states, and the transition function uses the function of M on the first component and the function of N on the second. The final states are pairs (m, n) where exactly one of m and n are final in their respective machines. This DFA O, when it reads a string w, goes to a state (m, n) where m and n are the states of M and N respectively on reading w. O accepts w, therefore, exactly when one of M and N accepts w and the other doesn't.

• Question 6 (10): We define a queue machine to be analogous to a PDA. It is nondeterministic and has a finite state set, a start state, a final state set, an input alphabet, and a queue alphabet. But its transitions enqueue and dequeue single characters from a queue rather than push and pop characters from a stack. Describe (in English) a queue machine whose language is not context-free.

The following queue machine's language is the non-CFL {anbncn: n ≥ 0}.

1. Read a's from the input, putting an a on the queue for each a read.

2. Put a \$ on the queue.

3. Read b's from the input, dequeueing an a and enqueueing a b for each one read. If this is not possible, reject.

4. Dequeue and enqueue the \$. If this is not possible, reject.

5. Read c's from the input, dequeueing a b and enqueueing a c for each one read. If this is not possible, reject.

6. Dequeue the next character. If it is \$ and the input is all read, accept. Otherwise reject.

• Question 7 (10): Find a regular expression either for the language ABC defined above, or for its complement. (Your choice, no extra credit for both. If you attempt both, tell me which one you want me to grade.)

An easy regular expression for the complement of ABC is (a ∪ b)* ∪ (a ∪ c)* ∪ (b &cup c)*, because any string not in ABC must contain at most two different letters.

The simplest regular expression I can think of for ABC itself is the union of six terms, one for each possible ordering of the first a, first b, and first c in the string. The first term is aΣ***.

It's possible, but tedious, to construct an expression for either language from its eight-state DFA by state elimination.

• Question 8 (15): Let N be the NFA with state set {1, 2, 3, 4}, start state 1, final state set {2, 3}, alphabet {a, b}, and the following six transitions (arrows): (1, a, 2), (1, ε, 3), (2, b, 2), (2, ε, 3), (3, b, 4), and (4, a, 2).

• (a, 5) Circle exactly the strings on this list that are in the language L(N): ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb.

The strings in L(N) are ε, a, ab, ba, aba, abb, and bab.

• (b, 10) Construct an ordinary DFA D such that L(D) = L(N). (Any correct DFA gets full credit, but showing your reasoning is important to get partial credit for a wrong answer.)

Following the version of the construction in Sipser, the start start is {1, 3}, with a-arrow to {2, 3} and b-arrow to {4}. The state {2, 3} has a-arrow to the death state and b-arrow to {2, 3, 4}. The state {4} has a-arrow to {2, 3} and b-arrow to the death state. The state {2, 3, 4} has a-arrow to {2, 3} and b-arrow to itself. The death state, of course, has both arrows to itself. We are done -- only five of the 16 potential states are reachable.

• Question 9 (40+10): Let G be the grammar with nonterminals S and T, terminals a, b, and c, start symbol S, and rules S → TaT, T → bS, and T → c.

• (a, 10): Give a PDA M such that L(M) = L(G). (The simplest thing is probably to give the top-down parser, but any correct PDA gets full credit. Explaining your reasoning may help for partial credit.)

The top-down parser has state set {s, p, f}, start state s, only final state f, two transitions (s, ε, ε; p, S\$) and (p, ε, \$, f, ε), and loops on state p with labels (a, a; ε), (b, b; ε), (c, c; ε), (ε, S; TaT), (ε, T; bS), and (ε, T; c).

• (b, 10): Give a PDA M' such that L(M') = L(G) and such that M' meets the three conditions for the proof that any PDA may be simulated by a context-free grammar.

We already meet the conditions about the start and final state, so we only have to ensure that each transition either pushes or pops a single character but not both. We break the transition from s to p into two, the first pushing \$ and the second pushing S. Two of the loops must be broken up: (ε, S; TaT) into pop-S, push-T, push-a, and push-T, and (ε, T; bS) into pop-T, push-S, and push-b.

• (c, 10): Prove that for any positive integer k, there is a string in L(G) with exactly 3k letters.

The base case is k = 1, for which we can make cac, of length 3(1), by the derivation S --> TaT --> caT --> cac.

For the inductive case, assume that there is some derivation of a string w of length 3k from S. Then we can make a string of length 3(k+1) by S --> TaT --> bSaT --> bSac --> bwac, using the inductive hypothesis for the last step.

• (d, 10): Prove that every string in L(G) has a number of letters that is divisible by 3. (Hint: One way to do this is to simultaneously prove a fact about the lengths of strings that are derivable from the nonterminal T in the grammar G.)

Let f be the function from {S, T, a, b, c}* to the natural numbers that is the homomorphism taking S to 0 and all other letters to 1. The original string S of the derivation has f(S) = 0. Each of the rules preserves f modulo 3: the rule S --> TaT adds 3 and the other two rules perserve f exactly. So the f-value of any string appearing in a derivation from S must be divisible by 3. Any word in L(G) is such a string, and must have length equal to its f-value, which is divisible by 3.

• (e, 10XC): Prove that L(G) is not a regular language. (Hint: Work with the intersection of L(G) and the regular language b*(a ∪ c)* -- if the intersection is not regular than L(G) is not regular.)

Using the hint, I will show that L(G) ∩ R is non-regular, where R is the regular language b*(a ∪ c)*. I claim that L(G) ∩ R is equal to X = {bkc(ac)k+1: k ≥ 0}. (L(G) ∩ R ⊆ X because to make a string in R, I may only use the rule T --> bS at the beginning of the string. Thus every T that is not the first one in the string must go to c, and in effect I have the rule S --> Tac, which I may use only for S --> bSac or S --> cac. Thus any valid derivation uses the first of these rules k times and then the second once, getting a string in X. And of course X ⊆ L(G) by this derivation, and obviously X ⊆ R.

To see that X is non-regular, I can use Myhill-Nerode by observing that the set {bi: i ≥ 0} is pairwise X-distinguishable, with bi and bj distinguished by c(ac)i+1. Or I could use the Regular Language Pumping Lemma -- if p is the alleged pumping length I take w = bpc(ac)p+1. The pumped string must occur within the initial b's, and puming either up or down takes us out of X.