Question text is in black, solutions in blue.
Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 10 points Q6: 10+15 points Q7: 30 points Q8: 30 points Total: 120+15 points
The following formal languages are each used in one or more problems:
TRUE. A TM on input x can compute n, the length of x, then run f on the string w denoting n, and finally evaluate the circuit C = f(w) on the input x. This all takes finite time and definitely halts, and we can output whether x is in L.
TRUE. SD has a DFA, with a state for each of the four directions the band might
be facing. An F takes each state to itself, an A takes N to S, S to N, E to W,
and W to E, an L takes N to W, W to S, S to E, and E to N, and an R takes N to
E, E to S, S to W, and W to N. By Kleene's Theorem, since a DFA for the
language exists, a regular expression must also exist.
Though of course it is straightforward to compute that this regular
expression is [F + LF*R + (R + LF*A)
(F + AF*A)*(L + AF*R) +
(R + LF*A)(F + AF*A)*(R + AF*L)
(F + RF*L + (L + RF*A)(F + AF*A)*
(R + AF*L))*
(L + RF*A)(F + AF*A)*(L +
AF*R)]*, you were only asked to prove that the regular
expression exists. I'm glad no one spent the time during the exam to compute
it.
FALSE. The four-state DFA given in the solution to Question 2 is minimal. You could prove this by running the state-minimization algorithm, or by observing that every pair of states is SD-distinguishable. The simplest way to prove the latter is to note that each of the strings A, F, L, and R takes one of the four states to a final state and takes each of the others to a non-final state, so there is a string for each pair of states that proves non-equivalence.
FALSE. This looks like the undecidable problem of taking two grammars and telling whether L(G) ⊆ L(G'), but it is different. Since there is a grammar that generates no strings (with the single rule "S → S", for example), the language ∅ is a CFL, and this language is a subset of any language at all. So CCFL is just the set of all valid Turing machine descriptions, and this set is Turing decidable.
FALSE. In fact "RS ≤p 3WS" is true whether P = NP or not. In Question 6 (b) it is proved that RS is in the class L, which means that it is also in the class P and thus also in the class NP. By Question 7 (b), 3WS is NP-complete, and so every language in NP reduces to it. We could also define the poly-time reduction to just determine whether the input is in RS, output (1, 1, 1) (which is in 3WS) if it is, and output (1, 1, 2) if it is not.
Using Myhill-Nerode, the strings Ni and Nj are
RS distinguishable whenever i ≠ j, because then NiSi
is in RS and NiSj is not. Since there are infinitely
many classes, the language is not regular.
Using the Pumping Lemma, we can let p be the pumping length, consider the
string w = NpSp, and note that the pumping string y
must consist of one or more N's because |xy| ≤ p. So pumping up or down
yields a string not in RS.
Several of you tried to form a proof out of the fact that RS "contains" the
language {NiSi: i ≥ 0}, which is a relabeling of our
favorite non-regular language. But you have to be careful using this argument,
as you need a "reduction" that preserves regularity. It is not true, of course,
that if X is non-regular, and X ⊆ Y, then Y is non-regular (Y could be
Σ*, for example). It is true that if RS were regular,
its intersection with the regular language N*S* would
also be regular, and this is the language you know is non-regular.
Yes, RS is in L. A string x is in RS if and only if the number of N's in x equals the number of S's, and the number of E's equals the number of W's. We can have a five-tape Turing machine keep its read-only input on tape 1 and keep binary counters of the number of each letter seen on tapes 2 through 5. When it has seen all the inputs, it compares the counters to see whether to accept. Each counter contains a number that is at most n, so it needs only log n tape cells (maybe 1 + log n, I suppose) and the total read/write tape usage is O(log n).
RS is not a CFL, though it is the intersection of two CFL's, one for
the language of strings x with |x|N = |x|S and one
for the analogous language with equal numbers of E's and W's. It is actually
easy to prove RS to be a non-CFL with the CFL Pumping Lemma. Let w be the
string NpEpSpWp, where p is the
pumping length. If w = uvxyz with |vy| > 0 and |vxy| ≤ p, pumping down
must either change the number of N's without changing the number of S's, change
the E's without the W's, change the S's without the N's, or change the W's
without the E's, and any of these four things will take the string out of RS.
Many of you constructed context-free grammars that generated lots of strings
that were all in RS, but this proof shows that it is impossible to get all
the strings in RS with a grammar without getting some bad strings as well.
We can have a nondeterministic TM guess a string w in {A, B, C}k,
then deterministically add up all the si's such that character i
of w is an A, then add up all the si's such that character i of w
is a B, then similarly for C, then finally accept if and only if the three
sums are all the same. This is clearly poly-time and can accept if and only if
the input multiset is in 3WS.
Equivalently, we can define a certificate w for S to be a string w as above
that gives three equal sums, and observe that S is in 3WS if and only if a
certificate for S exists. Then it's clearly in deterministic poly-time to
take input (S, w) and accept if and only if w is a certificate for s -- we just
have to compute the three sums and compare them.
We must prove that SUBSET-SUM ≤p 3WS -- since 3WS is in NP by
7 (a) and we are given that SUBSET-SUM is NP-complete, this suffices.
Given an instance (s1, ..., sk; t) for SUBSET-SUM, we
must construct an instance for 3WS such that either both of the instances, or
neither, are in their respective languages. Define z to be the sum of the
si's, and define f(si, ..., sk; t) to be the
multiset M = (s1, ..., sk, 2z, 2z-t, t+z). The sum of the
elements of M is z + 2z + (2z-t) + (t+z) = 6z, so a successful three-way split
of M must have each set sum to 2z. So one partition must contain just 2z, and
the other two new elements must go in separate partitions because together they
add to 3z. To complete the three-way split, the si's must be divided
into the two partitions that still have room, with exactly t going into the
partition with the element 2z-t and exactly z-t going into the other. This of
course is possible if and only if there is a submultiset of the si's
that sums exactly to t.
(This is not the only way to carry out the construction, but it's the first
one I came up with during the exam. It's simpler, of course, to have
the three extra elements be of size z, z-t, and t but this doesn't work
because this multiset can always be partitioned to have A contain the old
elements, B contain just z, and C contain the other two new elements. At least
one person constructed a new element of size z-2t without checking first that
2t ≤ z -- it's quite possible for 2t > z to be true and in that case you
need another construction.)
The simplest way to decide ANLBA with a Turing machine, without
any resource constraints, is to construct a graph containing all possible
configurations of M on input w and then, say, DFS this graph to determine
whether it has a path from the start configuration to an accepting
configuration. The number of configurations is large (see below) but finite
because the space available to M is bounded.
Several people made the following incorrect argument: We know that NDTM's
can be converted into equivalent DTM's, and that the language ALBA
is Turing decidable. But when we convert a NLBA to a DTM using the
construction from Chapter 3, we don't produce an LBA -- the DTM needs a tape
to record the entire choice sequence of the NLBA and this could be as long
as O(ngn) where n is the input length and g is the size of the NLBA's
tape alphabet.
Achieving this space bound (as far as we know) requires the middle-first
search algorithm of Savitch's Theorem. We first imagine the directed
graph of all configurations described above. Since a configuration of M is
defined by the state, the head position, and the tape contents, there are
qngn configurations where q is the number of states, n is the input
length, and g is the size of the tape alphabet. We want to use O(n2)
space to determine whether there is a path from the start configuration to the
accepting configuration in this graph. (We can alter M to have a unique
accepting configuration, or just add one more node to the directed graph.)
Middle-first search answers questions of the form PATH(u, v, k) where u and
v are nodes of the graph and k is a positive integer -- actually we will
restrict ourselves to cases where k is a power of 2. The predicate
PATH(u, v, k) is true if and only if there is a path of length k or less from
u to v in the directed graph. The basic step of the algorithm for
PATH(u, v, k) is to check whether there exists a node w such that
PATH(u, w, k/2) and PATH(w, v, k/2) are both true. The algorithm uses O(n)
space to store the w it is checking, because log(qngn) = O(n).
The two recursive calls to the PATH algorithm for each w are made one at a
time, so the space usage is just O(n) plus the space for the recursive call.
The initial value of k is a power of two greater than qngn, so the
depth of the recursion is also log(qngn) = O(n) and thus the total
space usage is O(n2). For the base case of the recursion, the
algorithm tests PATH(u, v, 1) by seeing whether either u = v or M can move from u to v in one step according to its rules.
Last modified 25 May 2009