CMPSCI 501: Theory of Computation

Solutions to Final Exam, Spring 2016

David Mix Barrington

5 May 2016

Directions:

Answer the problems on the exam pages.
There are eight problems (some with multiple parts) for 120 total points plus 10 extra credit. Actual scale was A = 100, C = 50.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first five questions are statements -- in each case say whether the statement is true or false and give a convincing justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. You get five points for the correct boolean answer (so there is no reason not to guess if you don't know) and up to five for the justification.

Question text is in black, solutions in blue.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 25 points
  Q7: 35 points
  Q8: 10+10 points
 Total: 120+10 points

If C is any class of computers, such as DFA's, CFG's, TM's, strange variant TM's, etc.:

A_C = {(M, w): M is a computer in C and w ∈ L(M)}
E_C = {(M): M is a computer in C and L(M) = ∅}
ALL_C = {(M): M is a computer in C and L(M) = Σ^*}

In particular, if C = P, then M must have a clock restricting it to some polynomial time bound, and if C = L, then M must have a marker restricting it to c log n worktape space for some constant c.

A language is Turing recognizable (TR) if it is equal to L(M) for some Turing machine M.

A language is Turing decidable (TD) if it is equal to L(M) for some Turing machine M that halts on every input.

A language is co-TR if and only if its complement is TR.

A function f from strings to strings is Turing computable if there exists a Turing machine M such that for any string w, M when started on w halts with f(w) on its tape.

Recall that if A and B are two languages, A is mapping reducible to B, written A ≤_m B, if there exists a Turing computable function f: Σ^* → Σ^* such that for any string w, w ∈ A ↔ f(w) ∈ B. If such an f exists that is computable in polynomial time, we say that A is poly-time reducible to B, written A ≤_p B. If f is computable in log space, we say that A is log-space reducible to B, written A ≤_L B.

The following languages are proved to be NP-complete in the text or in Exercises, and you may assume without proof that each of them is NP-complete.

3-SAT = {φ: φ is a satisfiable formula in 3-CNF}
3-COLOR = {G: G is an undirected graph and the vertices of G may each be assigned one of three colors so that no edge connects two vertices of the same color}
DHAMPATH: = {(G, s, t): G is a directed graph in which there is a directed path from vertex s to vertex t such that the path visits each vertex exactly once}
UHAMPATH: = {(G, s, t): G is an undirected graph in which there is an undirected path from vertex s to vertex t such that the path visits each vertex exactly once}
SUBSET-SUM = {(a₁,..., a_k, t): the a_i's and t are positive integers written in binary and there is a subset of the a_i's that adds to exactly t}
CLIQUE = {(G, k): G is an undirected graph and G has a set S of k vertices such that every two distinct vertices in S have an edge between them}
VERTEX-COVER = {(G, k): G is an undirected graph and G has a set S of k vertices such that every edge in G has at least one endpoint in S}

Recall that a quantified boolean formula is a statement of the form ∃x₁: ∀x₂: ∃x₃;:... ∀x_n: φ(x₁,..., x_n) where each x_i is a boolean variable and φ is a boolean formula in conjunctive normal form. It was proved in the text and in lecture that the language TQBF of true quantified boolean formulas is PSPACE-complete, and you may assume this fact without proof.

A DogShow object consists of a set D of dogs, a set E of events, and a relation C ⊆ D × E such that C(d, e) means "dog d competes in event e".

A Schedule object for a DogShow (D, E, C) is a positive integer t and a function S from E to {1,..., t}. (Assume that t ≤ |E|.) A Schedule is valid if there do not exist a dog d and two distinct events e and e' such that C(d, e), C(d, e'), and S(e) = S(e') are all true. (Thus a schedule is valid if no dog competes in two different events that are scheduled at the same time.

The language DS-SCHED-OK is the set {(D, E, C, S): S is a valid schedule for (D, E, C)}.

The language DS-POSSIBLE is the set {(D, E, C, t): ∃S: (D, E, C, S) ∈ SCHED-OK and S has parameter t}.

Let M be any machine that takes an input string in {0, 1}^*. We define a number of games for M. The solitaire word game for M has White write down a string w and win the game if and only if w ∈ L(M). The bounded solitaire word game for M and n further requires that |w| = n. The bounded alternating word game for M and n is similar, except that now w is formed alternately by White and Black naming letters in {0, 1} until n letters have been named. White still wins if and only if w is in L(M).

We define a number of languages based on these games. If C is any class of computers, SWG_C is the set of computers M in C such that White wins the solitaire word game on M. Similarly BSWG_C is the set {(M, 1^t): M is a computer in C and White wins the bounded solitaire word game for M and n}, and BAWG_G is the similar language for the bounded alternating word game. (We make the second input component 1ⁿ rather than the binary for n so that the input size will be O(n).)

We define two particular context-free languages called the Dyck languages, which are essentially strings of balanced parentheses. The language D₁ has the context-free grammar with rules S → aSb, S → SS, and S → ε. The language D₂ has these three rules plus the additional rule S → cSd.

Question 1 (10): True or false with justification: The language D₁ is regular.
FALSE. Note that the string aⁱb^j is in D₁ if and only if a = b. So the set of strings {a¹: i ≥ 0} is a pairwise D₁-distinguishable set, because the string bⁱ distringuishes aⁱ from any other a^j. This language is also easily proved to be non-regular by the Regular Language Pumping Lemma -- given any p, take w to be the string a^pb^p, and note that pumping down removes one or more a's and yields a string not in D₁.
Question 2 (10): True or false with justification: The language D₁ is in the class L = DSPACE(log n), but the language D₂ is not.
FALSE. Both languages are in L, though only one of you had a correct proof of the latter. To test a string for membership in D₁, you can run the standard test for balanced parentheses, counting the number of a's seen minus the number of b's seen. The string is in D₁ if and only if this count is never negative for any substring, and finishes at 0.
This algorithm puts D₁ in L because we can implement it by keeping just one number, which fits in O(log n) bits because it cannot exceed the input length n.
The most obvious algorithm to test for membership in D₂ is to keep a stack, pushing a's and c's, popping an a for each b and a b for each d, rejecting if there is a mismatch, and accepting if the end is reached with an empty stack. This algorithm uses O(n) space, and so does not put D₂ into L, but it does not preclude the existence of another algorithm that does put it into L.
Many people gave the following incorrect algorithm -- keep two counters, one for the number of a's seen minus the number of b;s, and the other for the number of c's minus the number of d's, and accept if neither counter goes negative and the end is reached with both counters at 0. This algorithm accepts all strings in D₂, but also strings not in D₂ such as acbd.
But it turns out we can essentially simulate the stack algorithm in O(log n) space. It's easy to compute the size of the stack at each point -- it is just the number of a's and c's seen so far minus the number of b's and d's. What we need to confirm is that every time that algorithm sees a b or a d, the top letter on the stack is the matching a or c. If the current stack size is k, we just need to find the last letter than changed the stack size from k-1 to k. This can clearly be done by keeping a few counters with O(log n) bits each.
Question 3 (10): True or false with justification: The language SWG_TM is Turing decidable.
FALSE. A machine M is in SWG_TM if and only if it is not in the language E_TM, since if there is any word in L(M) White can play it and win, and if there is no word in L(M) White will definitely lose. We proved in lecture and the text that E_TM is not TD, so neither is its complement.
Question 4 (10): True or false with justification: The language SWG_L is turing recognizable but not Turing decidable.
TRUE. A recognizer for SWG_L just needs to test every word in Σ^* for membership in L(M), never halting if L(M) is empty. But SWG_L is not TD, and we can prove this by showing A_TM ≤_m SWG_L. Given a machine M and a string w, we build a machine N such that L(N) is the set of accepting computation histories of M on w. We can build such an N that operates in log space because given an alleged computation history, it just has to check that the history starts with initial configuration of M on w, that that last configuration is accepting, and that each configuration follows from the previous one by the rules of M. This last step can be accomplished by keeping two counters to mark the positions being compared in the two configurations, and these two counters take O(log n) space where n is the length of input to N.
Question 5 (10): True or false with justification: The language SWG_DFA is in the class P.
TRUE. As we observed above, SWG_DFA is just the complement of E_DFA, and we actually proved the latter to be in NL, a subset of P. A DFA D is in SWG_DFA if and only if there is any path in D's state graph from the start state to any final state. We can test this by either depth-first search (in P) or by using nondeterminism to guess a path (in NL).
Question 6 (25): These questions use the definitions of DogShow and Schedule objects above, and their associated languages.
- (a, 5) Suppose that a DogShow object (D, E, C) has at most n dogs and at most n events. Explain why the size of the input strings to DS-SCHED-OK or DS-POSSIBLE for this object have size polynomial in n.
  The input to DS-SCHED-OK is a set D of dogs, a set E of events, a relation from D to E (a matrix of at most n² bits), and a function from E to {1,...,t} where t ≤ n (at most O(n log n) bits). This is O(n²) bits in all.
  The input to DS-POSSIBLE is just D, E, the relation C, and the single number t, and these can also all be written in only O(n²) bits.
- (b, 10) Prove that the language DS-SCHED-OK is in the class P.
  For every dog d, and every event e, we check every event e' with e' ≠ e, and reject if S(e) = S(e'), C(d, e), and C(d, e') are all true. These are O(n³) possible checks of single attributes of the input string.
- (c, 10) Prove that the language DS-POSSIBLE is NP-complete.
  It is clear that DS-POSSIBLE is in the class NP becasue DS-SCHED-OK is a verifier for it (there exists an S with parameter t such that (D, E, C, S) is in DS-SCHED-OK if and only if (D, E, C, t) is in DS-POSSIBLE) and we showed in part (b) that this verifier is in P.
  We can reduce 3-COLOR to DS-POSSIBLE. Given an undirected graph (V, E), we let the set of events be V and let the set of dogs be E. The relation C(e, v) is true if and only if v is one of the endpoints of the edge e. We set t to be 3. Then the input (E, V, C, 3) is in DS-POSSIBLE if and only if the events can be divided into three groups such that no dog is in two events in the same group. And this is true if and only if edge of the graph connects two vertices in the same group. The mapping is clearly poly-time.
Question 7 (35): These questions all involve the solitaire and alternating word games defined above, and their associated languages.
- (a, 10) Prove that BSWG_TM ≤_m BAWG_TM.
  Given a machine M and a number n, we need to create a machine N such that (M, n) is in BSWG_TM if and only if (N, n) is in BAWG_TM. We design N to ignore the even-numbered cells of its tape, and run M on the odd-numbered cells. Then if M is in BSWGTM, White can win the BAWG for (N, n) by playing her winning word for the BSWG on M, whatever letters Black puts into the even-numbered cells. Similarly if White wins the BAWG on (N, n), she must have a winning strategy that is independent of Black's moves, since Black's moves do not affect N's behavior at all. (For example, whatever she does to win when Black always plays 0 will be a winning strategy for any other Black moves.) The word she plays in her winning strategy must also be a winning word in the BSWG for (M, n).
- (b, 10) Prove that the language BSWG_L is NP-complete.
  We first show that BSWG_L is in NP, by giving a poly-time verifier for it. This verifier is the set of tuples (M, n, w, 1^p(n)) where p(n) is a polynomial time bound for M and w is in L(M). Since M is a log-space machine with an explicit space bound, it also has an explicitly computable time bound. And simulating M on w for at most p(n) steps can be done in time polynomial in p(n), and thus polynomial in the length of the input to the verifier.
  To show that BSWG_L is NP-complete, we reduce 3-SAT to it. Given a 3-CNF formula φ with n input variables, we create a machine M_φ that takes a string w as input, rejects if w is not length exactly n, and otherwise tests whether w satisfies φ. (We assume that φ has no redundant clauses and so has length O(n³.) Clearly White wins the BSWG for (M_φ, n) if and only if φ is satisfiable, and M_φ is a log-space machine because the only read-write memory it needs is a pointer into w -- it stores φ within its state table and stores w on its read-only input tape. The mapping from φ to M_φ is clearly poly-time.
- (c, 15) Prove that the language BAWG_L is PSPACE-complete.
  Given M and n, the game tree for the BAWG has depth n and size O(2ⁿ). We can evaluate the winner by a recursive algorithm, where the player to move at a given node of the tree has a winning strategy if and only if they have a winning strategy for at least one of the child nodes. The recursion has depth of n, and at each stage of the recursion we need to store only the configuration of M, which takes only O(n) space because M has made only O(n) moves. (Actually, since M is a log-space machine, each configuration can be stored in O(log n) space and the total stack space needed is O(n log n). In any case, membership in BAWG_L can be determined in PSPACE.
  To prove completeness, we must reduce TQBF to BAWG_L. Given a quantified boolean formula ∃x₁...φ, we need to create a machine M and a number n such that White wins the BAWG (M, n) if and only if the formula is true, and M runs in log space. The game action consists of White and Black naming the values of the n variables, and when this is done M must determine whether the boolean formula φ is true for the given values. M must have the clauses of φ encoded within its state table, so that it can check each clause in turn. The string of values is the input to M, so it does not count against M's space bound.
Question 8 (10+10): These questions involve the Dyck language D₂ defined above. Given a string w of length n, define a predicate P(i, j), for all i and j with 1 ≤ i ≤ j ≤ n, to be true if and only if the string w_i...w_j is in the language D₂.
- (a, 10) Give a recursive definition of the predicate P(i, j), with base cases for P(i, i) and an inductive case defining each value P(i, j) by boolean operations on other values P(k, l) and predicates of the form w_i = a, w_i = b, w_i = c, and w_i = d. Explain why you are guaranteed to reach a base case.
  P(i, i) is false. P(i, i+1) is true if and only if either w_i is a and w_i+1 is b, or w_i is c and w_i+1 is d. If i+1 < j, then P(i, j) is true if and only if either there exists some k such that both P(i, k) and P(k+1, j) are true, or if P(i+1, j-1) is true, and either w_i = a and w_j = b, or w_i = c and w_j = d. We reach a base case because each recursive call is to a case where j-i is smaller than it was, and when j-i is 1 or 2 we can evaluate it without recursion.
- (b, 10XC) Describe a circuit family, based on your definition in part (a), such that the circuit C_n will input 4n boolean variables describing a string w ∈ {a, b, c, d}ⁿ and output a boolean that says whether w ∈ D₂. What are the size and depth of your circuits, in big-O terms as a function of n?
  We have a gate for each value P(i, j) where i < j. For the P(i, i+1) cases we have the OR of two AND-gates, computing (w_i = a AND w_i+1 = b) OR (w_i = c AND w_i+1 = d).
  We actually only need gates for odd values of j-1 because the ones with even values of j-1 are always false.
  When j-i is an odd value greater than 1, our recursive definition says that P(i, j) is the OR of:
  This is an OR of about n/2 binary ANDs, where the input to the ANDs are either basic values or other P values.
  All in all, we have O(n²) values of P(i, j) to compute to get to our desired value of P(1, n). Each OR computation involves O(n) intermediate gates in the binary tree of binary ORs, so our total size is O(n³). The computation of P(1, n) may go through O(n) other values of P(i, j), and each of those may involve a binary tree of ORs of depth O(log n), so our total depth is O(n log n).

Last modified 25 July 2016