# Solutions to First Midterm Exam, Spring 2017

### Directions:

• Answer the problems on the exam pages.
• There are nine problems (some with multiple parts) for 125 total points. Actual scale was A = 100, C = 64.
• If you need extra space use the back of a page.
• No books, notes, calculators, or collaboration.
• The first six questions are statements -- in each case say whether the statement is true or false and give a convincing justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. You get five points for the correct boolean answer (so there is no reason not to guess if you don't know) and up to five for the justification.

Q1: 10 points
Q2: 10 points
Q3: 10 points
Q4: 10 points
Q5: 10 points
Q6: 10 points
Q7: 20 points
Q8: 15 points
Q9: 30 points
Total: 125 points

Question text is in black, solutions in blue.

The language X over the alphabet {a, b, c} is defined by the grammar with start symbol S and rules S → aS, S → T, T → bbT, T → bc, T → Tcc, and T → ε.

The language Y over the alphabet {a, b, c} is the set {aibjck: (j = k) ∨ (|j - k| = 2)}.

The language Z over the alphabet {a, b, c} is the set {aibjck: i > j > k ≥ 0}.

A language is called Turing recognizable if and only if it is equal to the language L(M) for some (standard, deterministic, one-tape) Turing machine M. It is called Turing decidable if it is the language L(M) for some Turing machine that halts (accepts or rejects) on every input.

A string-cell Turing machine (SCTM) has a state set Q, including start state q0, accepting state qa, and rejecting state qr, an input alphabet Σ, a tape alphabet Γ with Σ ⊆ Γ, and a tape that is a sequence of cells c1, c2, c3,... At any time, the content of a tape cell is a string in Γ*. (The empty string ε plays the role of the blank symbol.) The transition function δ takes input in Q × (Γ ∪ {ε}) and has output in Q × (Γ ∪ {d, ε}) × {L, R, S}. Depending on the the current state and the leftmost character in the string in the current cell (or ε if the current cell has the empty string), the machine can either append an new character to the left of that string, delete the leftmost character (d), or leave the string unchanged (ε), and then either move left, move right, or stay put. The machien runs on an input string w ∈ Σ* by starting in state q0, looking at cell c1 which contains w, with all other cells containing ε. As with Sipser's TM's, if it is supposed to move left from c1 it stays put instead.

A restricted string-cell Turing machine (RSCTM) is a string-cell Turing machine that has only two cells (any attempt to move right from c2 results in staying put) and can only delete characters from c1, not add them.

• Question 1 (10): True or false with justification: The languages X and Y, defined above, are equal.

FALSE. The derivation S → T → bbT → bbbbT → bbbb puts the string bbbb in X, but bbbb is not in Y because |4 - 0| is not 2.

• Question 2 (10): True or false with justification: The language Z, defined above, is context-free.

FALSE. This should be pretty clear intuitively but is a little tricky to prove. Assume Z is context-free and let p be the constant for it in the CFL Pumping Lemma. Let w = ap+2bp+1cp. (Why do we choose this w? We need a string in Z, where the interesting places are separated by at least p letters, and where the string is only just barely in Z, such that small changes will lead to strings not in Z.)

If w = uvxyz, with |uxy| ≤ p, then either (1) vxy contains a's and/or b's but not c's, so that uxz either fails to have more than p+1 a's or fails to have more than p b's, while it still has p c's, and so is not in Z, or (2) vxy contains c's but not a's, in which case pumping up leads to a string with more c's than a's, which again is not in Z.

• Question 3 (10): True or false with justification: For any non-negative integer n, let z(n) be the number of strings of length n in the language Z defined above. Then there exists a positive real number c and an integer n0 such that for all integers n with n > n0, z(n) ≥ cn2.

TRUE. Consider all strings aibjck with 0 ≤ k ≤ 0.1n, 0.2n ≤ j ≤ 0.3n, and i = n - i - j. All such strings are in Z, and up to rounding there are 0.01n2 of them, so we can take c to be any number less than 0.01.

In fact if we look at all strings of length n in a*b*c*, of which there are C(n+2, 2) = (n+2)(n+1)/2, all but O(n) of them have i, j, and k distinct. Of this latter set, exactly 1/6 are in Z, so asymptotically z(n) is n2/12.

• Question 4 (10): True or false with justification: Let G be any context-free grammar in Chomsky normal form, and let M be any ordinary Turing machine that halts on all inputs. Then it is possible that L(G) ∩ L(M) is not a Turing decidable language.

FALSE. Build M' so that it runs M on w, rejecting if w ∉ L(M), then tests whether w ∈ L(G) and accept if and only if it is. Then L(M') = L(M) ∩ L(G) and M' always halts, so the language is TD. We didn't do the construction to decide an arbitrary CFL in lecture, but asserted several times that every CFL is TD. The simplest, though not the best, construction is to test all derivations of exactly 2n - 1 steps from S in G, and see whether any yield w, because any derivation of a string of n terminals in a Chomsky normal form CFG takes exactly that many steps.

• Question 5 (10): True or false with justification: Let USQ be the language {an2: n ≥ 0}. Then there exists a restricted string-cell Turing machine R (as defined abpve) such that L(R) = USQ.

FALSE. An RSCTM acts in essence as a deterministic PDA, with c1 holding the unread input and c2 acting as a stack. Like a PDA, the RSCTM can read an input character or not, and push or pop from its stack, in one step. (We need two RSCTM steps for a PDA transition that both pushes and pops.) But USQ is not a CFL and thus not the language of even a nondeterministic PDA. We can quote a homework problem and note that USQ is clearly not eventually periodic, or just use the CFL Pumping Lemma with w = a2p2, where if |vy| = k, with 1 ≤ k ≤ p, it is clear that 2p2 - k is not a perfect square.

• Question 6 (10): True or false with justification: There exist two languages U and V such that neither U nor V is context-free, but U ∪ V is context-free.

TRUE. Neither USQ nor its complement is a CFL (as neither is eventually periodic -- in fact the complement of a unary language is a CFL if and only if the language itself is a CFL, since the unary CFL's are exactly the unary regular languages). But the union of USQ and its complement is the regular language a*.

• Question 7 (20): Consider the language X defined above. Is is regular? If it is not, prove that it is not by giving an infinite set of pairwise X-distinguishable strings. If it is, give all of the following: (a) a DFA for X, (b) a regular expression for X, and (c) the index of X, i.e., the number of equivalence classes for its Myhill-Nerode relation. You may use standard constructions to convert one of these things to another, or produce each one directly from the definition of the language.

X is a regular language, the set {aibjck: j ≡ k (mod 2)}. To see this, note that any completed derivation in G uses the rule T → bbT m times, and the rule T → Tcc n times, so either j = 2m and k = 2n (if the last rule was T → ε) or j = 2m + 1 and k = 2n + 1 (if the last rule was T → bc). And any pair (j, k) with j ≡ k (mod 2) is in this form.

A DFA for this language has state set {1, 2, 3, 4, 5, 6} with transitions (1, a, 1), (1, b, 2), (1, c, 3), (2, a, 6), (2, b, 4), (3, a, 6), (3, b, 6), (3, c, 5), (4, a, 6), (4, b, 2), (4, c, 3), (5, a, 6), (5, b, 6), (5, c, 3), and (6, x, 6) for all letters x. The start state is 1 and the final states are 1, 4 and 5. This is correct because strings go to 1 if they are all a's (and thus in X), to 2 if they are a's followed by an odd number of b's, to 3 if they are in a*b*c* and are not in X, to 4 if they are a's followed by an even number of b's (and so are in X), to 5 if they are in X and have c's, and to 6 if they are not in a*b*c*.

A regular expression for this language is a*(bb)*(bc ∪ ∅*)(cc)*.

The index is 6 because the DFA above is minimal. We can distinguish the three final states by using a to distinguish 1 from 4 or 5, and bb to distinguish 4 and 5 from one another. We can distinguish the three non-final states by using b to distinguish 2 from 3 or 6, and c to distinguish 3 from 6.

• Question 8 (15): Suppose that I have an ordinary NFA N with no moves into its start state, a single final state different fro the start state, and no moves out of its final state. I can transform N into a PDA P, in the normal form for the PDA-to-CFG construction, by taking each transition (p, a, q), adding a new state r dedicated to that transition, and replacing the transition with two: (p, a, ε; r, c) and (r, ε, c; q, ε). (Note: I gave you the choice of using the same new stack letter c for all the pairs of transitions, or using a different letter for each pair.)

What happens when I then carry out the PDA-to-CFG construction? Carry it out to get a grammar, starting with the following N: state set {i, p, q, f}, start state i, only final state f, alphabet {a, b}, and transitions (i, ε, p), (p, a, q), (q, b, p), and (p, ε, f). Feel free to omit nonterminals and rules that cannot lead to generating any string of terminals.

Describe in English what happens when this construction is applied to a general NFA.

The PDA we build has eight states: let's call the new states r between i and p, s between p and q, t between q and p, and u between p and f. Following the hint we'll use four different stack letters to push and pop in the four different transition pairs. We have eight total transitions.

In our grammar, the start symbol is Aif, we have 64 nonterminals in all, and our rules are of the form Axx → ε, of the form Axy → AxzAzy, and one special rule for each push-pop pair. These are Aip → Arr, Apq → aAss, Aqp → bAtt, and Apf → Auu. These last four can be simplified to Aip → ε, Apq → a, Aqp → b, and Apf → ε. Letting S be Aif, T be App, and simplifying a number of rules involving ε, we can get a simple grammar S → T, T → TT, T → ab, and T → ε, generating the language (ab)*.

In general, the push-pop pairs resolve to give a rule Apq → a for every NFA transition (p, a, q), where a is either an input letter or ε. The only way to get a derivation in the grammar for this PDA is to use the transitivity rules to break Aif into a sequence of nonterminals, one for each edge in a path through the NFA from i to f. Then each of the edge nonterminals can be changed to the letter (or ε) read by the NFA when traversing that edge.

• Question 9 (30): The Church-Turing thesis says that any "reasonable" model of general computation will lead to the same classes of recongnizable and decidable languages. Above we have defined the new model of string-cell Turing machines. We explore here whether it is in fact a "reasonable" model.

• (a, 15) Let S be an arbitrary SCTM. Give an implementation-level description (not a state table, but enough to allow someone with enough time on their hands to design the state table) of an ordinary (deterministic, one-tape) Turing machine M such that L(M) = L(S).

The general idea is for M to simulate S by storing the contents of each of S's cells on its tape, using a # symbol to separate each pair of adjacent cells. For the initial setup, we need to put a # before and after the w on the tape, and a third # after the second to represent the first empty cell c2 of S. We mark the # to the left of the current cell at any given time. To execute a step of S, we read the character to the right of the marked #, decide what to do, then do it by adjusting the tape. Note that if the character to the right of the marked # is another #, we know that the current cell is empty and act accordingly. Doing what S does might involve moving the entire contents of the tape, to the right of where we are, one space right to make room for a new letter. If we move to a previously untouched cell, we leave another # to allow the tape to represent an empty cell. We may want a special mark on the # before c1, to help us model the special behavior if S moves left from there. Overall we can simulate S much as we simulated a multitape TM with an ordinary TM in lecture, continuing the simulation until or unless S halts.

• (b, 15) Is every Turing recognizable language equal to L(S) for some SCTM S? Prove your answer. If your answer involves the construction of a machine, give an implementation-level description rather than a state table.

Yes, any TM can be simulated by an SCTM. Let M be an arbitrary ordinary TM. The basic idea is to use one SCTM cell to represent each cell of M's tape, keeping either a single letter to represent a non-blank letter or the empty string to represent a blank.

The most complicated part of the simulation is that S begins with w in the cell c1, while the simulation needs it to have one letter in each of the first n cells. So we have an initial phase where we read the leftmost letter of c1 delete it, and copy it to the correct cell, until c1 is exhausted.

To simulate one step of M with a step of S is straightforward, except that S cannot both insert and delete a letter in one step. So any step of M that changes the character in the current cell must be simulated by two steps of S, one to remove the old letter and one to insert the new one.