CMPSCI 401: Theory of Computation

Solutions to Final Exam, Spring 2009

David Mix Barrington

25 May 2009

Directions:

Answer the problems on the exam pages.
There are eight problems for 120 total points plus 15 extra credit. Actual scale is A = 100, C = 65.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first five questions are true/false, with five points for the correct boolean answer and up to five for a correct justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. -- note that there is no reason not to guess if you don't know.

Question text is in black, solutions in blue.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 10+15 points
  Q7: 30 points
  Q8: 30 points

  Total: 120+15 points

The following formal languages are each used in one or more problems:

Recall from Sipser that an LBA is a deterministic one-tape Turing machine that is restricted to the portion of the tape that originally contains the input, and that A_LBA is the set of pairs (M, w) such that M is an LBA and w ∈ L(M). Similarly, we define an NLBA to be a nondeterministic one-tape Turing machine that is restricted to the portion of the tape that originally contains its input, and define A_NLBA to be the set of pairs (M, w) such that M is an NLBA and w ∈ L(M).
Sipser defines the language SUBSET-SUM to be {(s₁, ..., s_k; t): Each s_i is a positive integer written in binary, t is a positive integer written in binary, and there exists a subset of the s_i's that adds up to exactly t}. Although we did not present the proof in lecture, Sipser proves that SUBSET-SUM is an NP-complete language.
We define the language 3WS (for "three-way split") to be the set {(s_i, ..., s_k): Each s_i is a positive integer written in binary and there exists a partition of the multiset S of s_i's into three pairwise disjoint submultisets A, B, and C, where A ∪ B ∪ C = S and the sums of the numbers in each of A, B, and C is the same.
The language SD (for "same direction") is the set of strings over the alphabet {A, F, L, R} such that if a marching band begins facing north and executes the sequence of commands given by the string, they will end up again facing north. The command A ("about face") means to turn 180 degrees, F ("forward march") means not to turn at all, L ("left face") means to turn 90 degrees to the left, and R ("right face") means to turn 90 degrees to the right.
The language RS (for "return to start") is the set of strings over the alphabet {N, E, S, W} such that if a marching band executes the sequence of commands given by the string, on an arbitrarily large flat field, they will return to their original position. The command N means to march one unit to the north, and similarly the commands E, S, and W mean to march one unit east, south, or west respectively.
The language CCFL ("contains a CFL") is the set {M: M is a Turing machine and there exists a context-free grammar G such that L(G) ⊆ L(M)}.

Question 1 (10): True or false with justification: Let L ⊆ {0, 1}^* be a language. Let f be a Turing computable function such that if w is a binary string denoting a positive integer n, then f(w) is a description of a boolean circuit C_n, with n inputs, such that if x is any binary string of length n, x ∈ L if and only if C_n outputs 1 on input x. Then L is a Turing decidable language.
TRUE. A TM on input x can compute n, the length of x, then run f on the string w denoting n, and finally evaluate the circuit C = f(w) on the input x. This all takes finite time and definitely halts, and we can output whether x is in L.
Question 2 (10): True or false with justification: There exists a regular expression denoting the language SD defined above.
TRUE. SD has a DFA, with a state for each of the four directions the band might be facing. An F takes each state to itself, an A takes N to S, S to N, E to W, and W to E, an L takes N to W, W to S, S to E, and E to N, and an R takes N to E, E to S, S to W, and W to N. By Kleene's Theorem, since a DFA for the language exists, a regular expression must also exist.
Though of course it is straightforward to compute that this regular expression is [F + LF^*R + (R + LF^*A) (F + AF^*A)^*(L + AF^*R) + (R + LF^*A)(F + AF^*A)^*(R + AF^*L) (F + RF^*L + (L + RF^*A)(F + AF^*A)^* (R + AF^*L))^* (L + RF^*A)(F + AF^*A)^*(L + AF^*R)]^*, you were only asked to prove that the regular expression exists. I'm glad no one spent the time during the exam to compute it.
Question 3 (10): True or false with justification: There exists a three-state DFA that decides the language SD defined above.
FALSE. The four-state DFA given in the solution to Question 2 is minimal. You could prove this by running the state-minimization algorithm, or by observing that every pair of states is SD-distinguishable. The simplest way to prove the latter is to note that each of the strings A, F, L, and R takes one of the four states to a final state and takes each of the others to a non-final state, so there is a string for each pair of states that proves non-equivalence.
Question 4 (10): True or false with justification: The language CCFL defined above is not Turing decidable.
FALSE. This looks like the undecidable problem of taking two grammars and telling whether L(G) ⊆ L(G'), but it is different. Since there is a grammar that generates no strings (with the single rule "S → S", for example), the language ∅ is a CFL, and this language is a subset of any language at all. So CCFL is just the set of all valid Turing machine descriptions, and this set is Turing decidable.
Question 5 (10): True or false with justification: If P ≠ NP, there cannot be a polynomial-time reduction from the language RS to the language 3WS -- that is, the statement "RS ≤_p 3WS" is false. (Both these languages are defined above.)
FALSE. In fact "RS ≤_p 3WS" is true whether P = NP or not. In Question 6 (b) it is proved that RS is in the class L, which means that it is also in the class P and thus also in the class NP. By Question 7 (b), 3WS is NP-complete, and so every language in NP reduces to it. We could also define the poly-time reduction to just determine whether the input is in RS, output (1, 1, 1) (which is in 3WS) if it is, and output (1, 1, 2) if it is not.
Question 6 (10+15): These three questions involve the language RS defined above.
- (a,10) Prove carefully that RS is not a regular language.
  Using Myhill-Nerode, the strings Nⁱ and N^j are RS distinguishable whenever i ≠ j, because then NⁱSi is in RS and NⁱS^j is not. Since there are infinitely many classes, the language is not regular.
  Using the Pumping Lemma, we can let p be the pumping length, consider the string w = N^pS^p, and note that the pumping string y must consist of one or more N's because |xy| ≤ p. So pumping up or down yields a string not in RS.
  Several of you tried to form a proof out of the fact that RS "contains" the language {NⁱSⁱ: i ≥ 0}, which is a relabeling of our favorite non-regular language. But you have to be careful using this argument, as you need a "reduction" that preserves regularity. It is not true, of course, that if X is non-regular, and X ⊆ Y, then Y is non-regular (Y could be Σ^*, for example). It is true that if RS were regular, its intersection with the regular language N^*S^* would also be regular, and this is the language you know is non-regular.
- (b, 5XC) Is RS in the class L (also known as DSPACE(log n))? Prove your answer.
  Yes, RS is in L. A string x is in RS if and only if the number of N's in x equals the number of S's, and the number of E's equals the number of W's. We can have a five-tape Turing machine keep its read-only input on tape 1 and keep binary counters of the number of each letter seen on tapes 2 through 5. When it has seen all the inputs, it compares the counters to see whether to accept. Each counter contains a number that is at most n, so it needs only log n tape cells (maybe 1 + log n, I suppose) and the total read/write tape usage is O(log n).
- (c, 10XC) Is RS a context-free language? Prove your answer.
  RS is not a CFL, though it is the intersection of two CFL's, one for the language of strings x with |x|_N = |x|_S and one for the analogous language with equal numbers of E's and W's. It is actually easy to prove RS to be a non-CFL with the CFL Pumping Lemma. Let w be the string N^pE^pS^pW^p, where p is the pumping length. If w = uvxyz with |vy| > 0 and |vxy| ≤ p, pumping down must either change the number of N's without changing the number of S's, change the E's without the W's, change the S's without the N's, or change the W's without the E's, and any of these four things will take the string out of RS.
  Many of you constructed context-free grammars that generated lots of strings that were all in RS, but this proof shows that it is impossible to get all the strings in RS with a grammar without getting some bad strings as well.
Question 7 (30): This question involves the languages SUBSET-SUM and 3WS defined above. You may assume without proof that SUBSET-SUM is NP-complete (as this is proved in Sipser).
- (a,10) Prove carefully that 3WS is in the class NP.
  We can have a nondeterministic TM guess a string w in {A, B, C}^k, then deterministically add up all the s_i's such that character i of w is an A, then add up all the s_i's such that character i of w is a B, then similarly for C, then finally accept if and only if the three sums are all the same. This is clearly poly-time and can accept if and only if the input multiset is in 3WS.
  Equivalently, we can define a certificate w for S to be a string w as above that gives three equal sums, and observe that S is in 3WS if and only if a certificate for S exists. Then it's clearly in deterministic poly-time to take input (S, w) and accept if and only if w is a certificate for s -- we just have to compute the three sums and compare them.
- (b,20) Complete the proof that 3WS is NP-complete, by building the appropriate poly-time reduction involving SUBSET-SUM, or otherwise.
  We must prove that SUBSET-SUM ≤_p 3WS -- since 3WS is in NP by 7 (a) and we are given that SUBSET-SUM is NP-complete, this suffices.
  Given an instance (s₁, ..., s_k; t) for SUBSET-SUM, we must construct an instance for 3WS such that either both of the instances, or neither, are in their respective languages. Define z to be the sum of the s_i's, and define f(s_i, ..., s_k; t) to be the multiset M = (s₁, ..., s_k, 2z, 2z-t, t+z). The sum of the elements of M is z + 2z + (2z-t) + (t+z) = 6z, so a successful three-way split of M must have each set sum to 2z. So one partition must contain just 2z, and the other two new elements must go in separate partitions because together they add to 3z. To complete the three-way split, the s_i's must be divided into the two partitions that still have room, with exactly t going into the partition with the element 2z-t and exactly z-t going into the other. This of course is possible if and only if there is a submultiset of the s_i's that sums exactly to t.
  (This is not the only way to carry out the construction, but it's the first one I came up with during the exam. It's simpler, of course, to have the three extra elements be of size z, z-t, and t but this doesn't work because this multiset can always be partitioned to have A contain the old elements, B contain just z, and C contain the other two new elements. At least one person constructed a new element of size z-2t without checking first that 2t ≤ z -- it's quite possible for 2t > z to be true and in that case you need another construction.)
Question 8 (30): This question involves the language A_NLBA defined above.
- (a,10) Prove by any method that A_NLBA is Turing decidable. (Note that a correct answer to part (b) suffices.)
  The simplest way to decide A_NLBA with a Turing machine, without any resource constraints, is to construct a graph containing all possible configurations of M on input w and then, say, DFS this graph to determine whether it has a path from the start configuration to an accepting configuration. The number of configurations is large (see below) but finite because the space available to M is bounded.
  Several people made the following incorrect argument: We know that NDTM's can be converted into equivalent DTM's, and that the language A_LBA is Turing decidable. But when we convert a NLBA to a DTM using the construction from Chapter 3, we don't produce an LBA -- the DTM needs a tape to record the entire choice sequence of the NLBA and this could be as long as O(ngⁿ) where n is the input length and g is the size of the NLBA's tape alphabet.
- (b,20) Give a deterministic algorithm that decides A_NLBA and uses O(n²) space on inputs of size n.
  Achieving this space bound (as far as we know) requires the middle-first search algorithm of Savitch's Theorem. We first imagine the directed graph of all configurations described above. Since a configuration of M is defined by the state, the head position, and the tape contents, there are qngⁿ configurations where q is the number of states, n is the input length, and g is the size of the tape alphabet. We want to use O(n²) space to determine whether there is a path from the start configuration to the accepting configuration in this graph. (We can alter M to have a unique accepting configuration, or just add one more node to the directed graph.)
  Middle-first search answers questions of the form PATH(u, v, k) where u and v are nodes of the graph and k is a positive integer -- actually we will restrict ourselves to cases where k is a power of 2. The predicate PATH(u, v, k) is true if and only if there is a path of length k or less from u to v in the directed graph. The basic step of the algorithm for PATH(u, v, k) is to check whether there exists a node w such that PATH(u, w, k/2) and PATH(w, v, k/2) are both true. The algorithm uses O(n) space to store the w it is checking, because log(qngⁿ) = O(n). The two recursive calls to the PATH algorithm for each w are made one at a time, so the space usage is just O(n) plus the space for the recursive call. The initial value of k is a power of two greater than qngⁿ, so the depth of the recursion is also log(qngⁿ) = O(n) and thus the total space usage is O(n²). For the base case of the recursion, the algorithm tests PATH(u, v, 1) by seeing whether either u = v or M can move from u to v in one step according to its rules.
Last modified 25 May 2009