Question text is in black, solutions in blue.
Q1: 10 points Q2: 10 points Q3: 10 points Q4: 10 points Q5: 10 points Q6: 10+15 points Q7: 30 points Q8: 30 points Total: 120+15 points
The following formal languages are each used in one or more problems:
TRUE. A TM on input x can compute n, the length of x, then run f on the string w denoting n, and finally evaluate the circuit C = f(w) on the input x. This all takes finite time and definitely halts, and we can output whether x is in L.
TRUE. SD has a DFA, with a state for each of the four directions the band might be facing. An F takes each state to itself, an A takes N to S, S to N, E to W, and W to E, an L takes N to W, W to S, S to E, and E to N, and an R takes N to E, E to S, S to W, and W to N. By Kleene's Theorem, since a DFA for the language exists, a regular expression must also exist.
Though of course it is straightforward to compute that this regular expression is [F + LF*R + (R + LF*A) (F + AF*A)*(L + AF*R) + (R + LF*A)(F + AF*A)*(R + AF*L) (F + RF*L + (L + RF*A)(F + AF*A)* (R + AF*L))* (L + RF*A)(F + AF*A)*(L + AF*R)]*, you were only asked to prove that the regular expression exists. I'm glad no one spent the time during the exam to compute it.
FALSE. The four-state DFA given in the solution to Question 2 is minimal. You could prove this by running the state-minimization algorithm, or by observing that every pair of states is SD-distinguishable. The simplest way to prove the latter is to note that each of the strings A, F, L, and R takes one of the four states to a final state and takes each of the others to a non-final state, so there is a string for each pair of states that proves non-equivalence.
FALSE. This looks like the undecidable problem of taking two grammars and telling whether L(G) ⊆ L(G'), but it is different. Since there is a grammar that generates no strings (with the single rule "S → S", for example), the language ∅ is a CFL, and this language is a subset of any language at all. So CCFL is just the set of all valid Turing machine descriptions, and this set is Turing decidable.
FALSE. In fact "RS ≤p 3WS" is true whether P = NP or not. In Question 6 (b) it is proved that RS is in the class L, which means that it is also in the class P and thus also in the class NP. By Question 7 (b), 3WS is NP-complete, and so every language in NP reduces to it. We could also define the poly-time reduction to just determine whether the input is in RS, output (1, 1, 1) (which is in 3WS) if it is, and output (1, 1, 2) if it is not.
Using Myhill-Nerode, the strings Ni and Nj are RS distinguishable whenever i ≠ j, because then NiSi is in RS and NiSj is not. Since there are infinitely many classes, the language is not regular.
Using the Pumping Lemma, we can let p be the pumping length, consider the string w = NpSp, and note that the pumping string y must consist of one or more N's because |xy| ≤ p. So pumping up or down yields a string not in RS.
Several of you tried to form a proof out of the fact that RS "contains" the language {NiSi: i ≥ 0}, which is a relabeling of our favorite non-regular language. But you have to be careful using this argument, as you need a "reduction" that preserves regularity. It is not true, of course, that if X is non-regular, and X ⊆ Y, then Y is non-regular (Y could be Σ*, for example). It is true that if RS were regular, its intersection with the regular language N*S* would also be regular, and this is the language you know is non-regular.
Yes, RS is in L. A string x is in RS if and only if the number of N's in x equals the number of S's, and the number of E's equals the number of W's. We can have a five-tape Turing machine keep its read-only input on tape 1 and keep binary counters of the number of each letter seen on tapes 2 through 5. When it has seen all the inputs, it compares the counters to see whether to accept. Each counter contains a number that is at most n, so it needs only log n tape cells (maybe 1 + log n, I suppose) and the total read/write tape usage is O(log n).
RS is not a CFL, though it is the intersection of two CFL's, one for the language of strings x with |x|N = |x|S and one for the analogous language with equal numbers of E's and W's. It is actually easy to prove RS to be a non-CFL with the CFL Pumping Lemma. Let w be the string NpEpSpWp, where p is the pumping length. If w = uvxyz with |vy| > 0 and |vxy| ≤ p, pumping down must either change the number of N's without changing the number of S's, change the E's without the W's, change the S's without the N's, or change the W's without the E's, and any of these four things will take the string out of RS.
Many of you constructed context-free grammars that generated lots of strings that were all in RS, but this proof shows that it is impossible to get all the strings in RS with a grammar without getting some bad strings as well.
We can have a nondeterministic TM guess a string w in {A, B, C}k, then deterministically add up all the si's such that character i of w is an A, then add up all the si's such that character i of w is a B, then similarly for C, then finally accept if and only if the three sums are all the same. This is clearly poly-time and can accept if and only if the input multiset is in 3WS.
Equivalently, we can define a certificate w for S to be a string w as above that gives three equal sums, and observe that S is in 3WS if and only if a certificate for S exists. Then it's clearly in deterministic poly-time to take input (S, w) and accept if and only if w is a certificate for s -- we just have to compute the three sums and compare them.
We must prove that SUBSET-SUM ≤p 3WS -- since 3WS is in NP by 7 (a) and we are given that SUBSET-SUM is NP-complete, this suffices.
Given an instance (s1, ..., sk; t) for SUBSET-SUM, we must construct an instance for 3WS such that either both of the instances, or neither, are in their respective languages. Define z to be the sum of the si's, and define f(si, ..., sk; t) to be the multiset M = (s1, ..., sk, 2z, 2z-t, t+z). The sum of the elements of M is z + 2z + (2z-t) + (t+z) = 6z, so a successful three-way split of M must have each set sum to 2z. So one partition must contain just 2z, and the other two new elements must go in separate partitions because together they add to 3z. To complete the three-way split, the si's must be divided into the two partitions that still have room, with exactly t going into the partition with the element 2z-t and exactly z-t going into the other. This of course is possible if and only if there is a submultiset of the si's that sums exactly to t.
(This is not the only way to carry out the construction, but it's the first one I came up with during the exam. It's simpler, of course, to have the three extra elements be of size z, z-t, and t but this doesn't work because this multiset can always be partitioned to have A contain the old elements, B contain just z, and C contain the other two new elements. At least one person constructed a new element of size z-2t without checking first that 2t ≤ z -- it's quite possible for 2t > z to be true and in that case you need another construction.)
The simplest way to decide ANLBA with a Turing machine, without any resource constraints, is to construct a graph containing all possible configurations of M on input w and then, say, DFS this graph to determine whether it has a path from the start configuration to an accepting configuration. The number of configurations is large (see below) but finite because the space available to M is bounded.
Several people made the following incorrect argument: We know that NDTM's can be converted into equivalent DTM's, and that the language ALBA is Turing decidable. But when we convert a NLBA to a DTM using the construction from Chapter 3, we don't produce an LBA -- the DTM needs a tape to record the entire choice sequence of the NLBA and this could be as long as O(ngn) where n is the input length and g is the size of the NLBA's tape alphabet.
Achieving this space bound (as far as we know) requires the middle-first search algorithm of Savitch's Theorem. We first imagine the directed graph of all configurations described above. Since a configuration of M is defined by the state, the head position, and the tape contents, there are qngn configurations where q is the number of states, n is the input length, and g is the size of the tape alphabet. We want to use O(n2) space to determine whether there is a path from the start configuration to the accepting configuration in this graph. (We can alter M to have a unique accepting configuration, or just add one more node to the directed graph.)
Middle-first search answers questions of the form PATH(u, v, k) where u and v are nodes of the graph and k is a positive integer -- actually we will restrict ourselves to cases where k is a power of 2. The predicate PATH(u, v, k) is true if and only if there is a path of length k or less from u to v in the directed graph. The basic step of the algorithm for PATH(u, v, k) is to check whether there exists a node w such that PATH(u, w, k/2) and PATH(w, v, k/2) are both true. The algorithm uses O(n) space to store the w it is checking, because log(qngn) = O(n). The two recursive calls to the PATH algorithm for each w are made one at a time, so the space usage is just O(n) plus the space for the recursive call. The initial value of k is a power of two greater than qngn, so the depth of the recursion is also log(qngn) = O(n) and thus the total space usage is O(n2). For the base case of the recursion, the algorithm tests PATH(u, v, 1) by seeing whether either u = v or M can move from u to v in one step according to its rules.
Last modified 25 May 2009