Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 10 points Q6: 10 points Q7: 20 points Q8: 15 points Q9: 30 points Total: 125 points
Question text is in black, solutions in blue.
The language X over the alphabet {a, b, c} is defined by the grammar with start symbol S and rules S → aS, S → T, T → bbT, T → bc, T → Tcc, and T → ε.
The language Y over the alphabet {a, b, c} is the set {aibjck: (j = k) ∨ (|j - k| = 2)}.
The language Z over the alphabet {a, b, c} is the set {aibjck: i > j > k ≥ 0}.
A language is called Turing recognizable if and only if it is equal to the language L(M) for some (standard, deterministic, one-tape) Turing machine M. It is called Turing decidable if it is the language L(M) for some Turing machine that halts (accepts or rejects) on every input.
A string-cell Turing machine (SCTM) has a state set Q, including start state q0, accepting state qa, and rejecting state qr, an input alphabet Σ, a tape alphabet Γ with Σ ⊆ Γ, and a tape that is a sequence of cells c1, c2, c3,... At any time, the content of a tape cell is a string in Γ*. (The empty string ε plays the role of the blank symbol.) The transition function δ takes input in Q × (Γ ∪ {ε}) and has output in Q × (Γ ∪ {d, ε}) × {L, R, S}. Depending on the the current state and the leftmost character in the string in the current cell (or ε if the current cell has the empty string), the machine can either append an new character to the left of that string, delete the leftmost character (d), or leave the string unchanged (ε), and then either move left, move right, or stay put. The machien runs on an input string w ∈ Σ* by starting in state q0, looking at cell c1 which contains w, with all other cells containing ε. As with Sipser's TM's, if it is supposed to move left from c1 it stays put instead.
A restricted string-cell Turing machine (RSCTM) is a string-cell Turing machine that has only two cells (any attempt to move right from c2 results in staying put) and can only delete characters from c1, not add them.
FALSE. The derivation S → T → bbT → bbbbT → bbbb puts the string bbbb in X, but bbbb is not in Y because |4 - 0| is not 2.
FALSE. This should be pretty clear intuitively but is a little tricky
to prove. Assume Z is context-free and let p be the constant for it
in the CFL Pumping Lemma. Let w =
ap+2bp+1cp.
(Why do we choose this w? We need a string in Z, where the
interesting places are separated by at least p letters, and where
the string is only just barely in Z, such that small changes
will lead to strings not in Z.)
If w = uvxyz, with |uxy| ≤ p, then either (1) vxy contains
a's and/or b's but not c's, so that uxz either fails to have more
than p+1 a's or fails to have more than p b's, while it still has p
c's, and so is not in Z, or (2) vxy contains c's but not a's, in
which case pumping up leads to a string with more c's than a's,
which again is not in Z.
TRUE. Consider all strings aibjck
with 0 ≤ k ≤ 0.1n, 0.2n ≤ j ≤ 0.3n, and i = n - i -
j. All such strings are in Z, and up to rounding there are
0.01n2 of them, so we can take c to be any number less
than 0.01.
In fact if we look at all strings of length n in
a*b*c*, of which there are C(n+2,
2) = (n+2)(n+1)/2, all but O(n) of them have i, j, and k distinct. Of
this latter set, exactly 1/6 are in Z, so asymptotically z(n) is n2/12.
FALSE. Build M' so that it runs M on w, rejecting if w ∉ L(M), then tests whether w ∈ L(G) and accept if and only if it is. Then L(M') = L(M) ∩ L(G) and M' always halts, so the language is TD. We didn't do the construction to decide an arbitrary CFL in lecture, but asserted several times that every CFL is TD. The simplest, though not the best, construction is to test all derivations of exactly 2n - 1 steps from S in G, and see whether any yield w, because any derivation of a string of n terminals in a Chomsky normal form CFG takes exactly that many steps.
FALSE. An RSCTM acts in essence as a deterministic PDA, with c1 holding the unread input and c2 acting as a stack. Like a PDA, the RSCTM can read an input character or not, and push or pop from its stack, in one step. (We need two RSCTM steps for a PDA transition that both pushes and pops.) But USQ is not a CFL and thus not the language of even a nondeterministic PDA. We can quote a homework problem and note that USQ is clearly not eventually periodic, or just use the CFL Pumping Lemma with w = a2p2, where if |vy| = k, with 1 ≤ k ≤ p, it is clear that 2p2 - k is not a perfect square.
TRUE. Neither USQ nor its complement is a CFL (as neither is eventually periodic -- in fact the complement of a unary language is a CFL if and only if the language itself is a CFL, since the unary CFL's are exactly the unary regular languages). But the union of USQ and its complement is the regular language a*.
X is a regular language, the set
{aibjck: j ≡ k (mod 2)}. To
see this, note that any completed derivation in G uses the rule T
→ bbT m times, and the rule T → Tcc n times, so either j =
2m
and k = 2n (if the last rule was T → ε) or j = 2m + 1 and
k = 2n + 1 (if the last rule was T → bc). And any pair (j, k)
with j ≡ k (mod 2) is in this form.
A DFA for this language has state set {1, 2, 3, 4, 5, 6} with
transitions (1, a, 1), (1, b, 2), (1, c, 3), (2, a, 6), (2, b, 4), (3,
a, 6), (3, b, 6), (3, c, 5), (4, a, 6), (4, b, 2), (4, c, 3), (5, a,
6), (5, b, 6), (5, c, 3), and (6, x, 6) for all letters x. The start
state is 1 and the final states are 1, 4 and 5. This is
correct because strings go to 1 if they are all a's (and thus in X),
to 2 if they are a's
followed by an odd number of b's, to 3 if they are in
a*b*c*
and are not in X, to 4 if they
are
a's followed by an even number of b's (and so are in X), to 5 if they
are in X and have c's, and to 6 if they are not in
a*b*c*.
A regular expression for this language is
a*(bb)*(bc ∪
∅*)(cc)*.
The index is 6 because the DFA above is minimal. We can
distinguish the three final states by using a to distinguish 1 from 4
or 5, and bb to distinguish 4 and 5 from one another.
We can distinguish the three non-final states by using b to
distinguish 2 from 3 or 6, and c to distinguish 3 from 6.
What happens when I then carry out the PDA-to-CFG construction? Carry it out to get a grammar, starting with the following N: state set {i, p, q, f}, start state i, only final state f, alphabet {a, b}, and transitions (i, ε, p), (p, a, q), (q, b, p), and (p, ε, f). Feel free to omit nonterminals and rules that cannot lead to generating any string of terminals.
Describe in English what happens when this construction is applied to a general NFA.
The PDA we build has eight states: let's call the new states r between
i and p, s between p and q, t between q and p, and u between p and f.
Following the hint we'll use four different stack letters to push and
pop in the four different transition pairs. We have eight total
transitions.
In our grammar, the start symbol is Aif, we have 64
nonterminals in all, and our rules
are of the form Axx → ε, of the form Axy →
AxzAzy, and one special rule for each push-pop
pair.
These are Aip → Arr, Apq →
aAss, Aqp → bAtt, and
Apf → Auu. These last four can be
simplified to Aip → ε, Apq →
a, Aqp → b, and Apf → ε.
Letting S be Aif, T be App, and simplifying a
number of rules involving ε, we can get a simple grammar S
→ T, T → TT, T → ab, and T → ε, generating
the language (ab)*.
In general, the push-pop pairs resolve to give a rule
Apq → a for every NFA transition (p, a, q), where a
is either an input letter or ε. The only way to get a
derivation in the grammar for this PDA is to use the transitivity
rules to break Aif into a sequence of nonterminals, one
for each edge in a path through the NFA from i to f. Then each of
the edge nonterminals can be changed to the letter (or ε)
read by the NFA when traversing that edge.
The general idea is for M to simulate S by storing the contents of each of S's cells on its tape, using a # symbol to separate each pair of adjacent cells. For the initial setup, we need to put a # before and after the w on the tape, and a third # after the second to represent the first empty cell c2 of S. We mark the # to the left of the current cell at any given time. To execute a step of S, we read the character to the right of the marked #, decide what to do, then do it by adjusting the tape. Note that if the character to the right of the marked # is another #, we know that the current cell is empty and act accordingly. Doing what S does might involve moving the entire contents of the tape, to the right of where we are, one space right to make room for a new letter. If we move to a previously untouched cell, we leave another # to allow the tape to represent an empty cell. We may want a special mark on the # before c1, to help us model the special behavior if S moves left from there. Overall we can simulate S much as we simulated a multitape TM with an ordinary TM in lecture, continuing the simulation until or unless S halts.
Yes, any TM can be simulated by an SCTM. Let M be an arbitrary
ordinary TM. The basic idea is to use one SCTM cell to represent
each cell of M's tape, keeping either a single letter to represent
a non-blank letter or the empty string to represent a blank.
The most complicated part of the simulation is that S begins with w
in the cell c1, while the simulation needs it to have one
letter in each of the first n cells. So we have an initial phase
where we read the leftmost letter of c1 delete it, and copy
it to the correct cell, until c1 is exhausted.
To simulate one step of M with a step of S is straightforward,
except that S cannot both insert and delete a letter in one step. So
any step of M that changes the character in the current cell must be
simulated by two steps of S, one to remove the old letter and one to
insert the new one.
Last modified 19 March 2017