CMPSCI 601: Theory of Computation

Solutions to Midterm Exam, Spring 2010

David Mix Barrington

16 March 2010

Directions:

Answer the problems on the exam pages.
There are five short problems, for ten points each, and three long problems for 25 points each. Attempt all the short problems and only two of the long ones -- the maximum score is thus 100. If you attempt all three long problems I will take the scores of the best two. Actual scale was A = 80, B = 50.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 25 points
  Q7: 25 points
  Q8: 25 points

  Total: max 100 points

Question text is in black, solutions in blue.

Question 1 (10): A two-dimensional Turing machine has a finite state set Q and a memory with a cell C(i,j) for each pair of non-negative integers i and j, where each cell holds an element of the tape alphabet Γ. Its transition function is δ: (Q × Γ) → (Q × Γ × {L,R,U,D}), so that based on the current state and the one cell that it sees it decides a new state, a new letter to write in this cell, and whether to move up, down, left, or right. We start the machine with the input word w in cells C(0,1) through C(0,n), and the blank character in all other cells. Two of the states in Q are halt states, one accepting and one rejecting.
Let P₂ be the class of languages that can be decided in polynomial time by a two-dimensional Turing machine. Prove that P₂ = P.
We first need to show that P is contained in P₂, which is easy but can't be forgotten. If X is an arbitrary language in P, we know from [AB] that X = L(M) for some poly-time single-tape Turing machine M. We can implement M on a 2DTM by using a machine that never moves its head up or down, and simulates the tape of M on the first row of its two-dimensional memory. The simulation is step-for-step and thus poly-time.
The harder direction is to simulate M, an arbitrary poly-time 2DTM, with an ordinary poly-time Turing machine M'. Let T(n) be a polynomial bound on the running time of M. We need to indicate how M', which can have only a constant number of tapes, can simulate the memory and actions of M. There are at least three good ways to do this:
Question 2 (10): Let M be a deterministic single-tape "Turing machine" with the following strange properties. The tape alphabet Γ = {a₁, a₂,...} is (countably) infinite rather than finite. The function δ from (Q × Γ) to (Q × Γ × {L,R}), is such that for any positive integer n and any state q ∈ Q, δ(q, a_n) can be computed by an always-halting ordinary Turing machine. Let X be the set of strings w such that M(w) = 1, and assume that M always halts. Prove that X is decidable in the ordinary sense.
We need to simulate the finite number of computation steps of M on some ordinary TM, called M'. The key concept of this simulation is how we simulate M's memory, a finite string of characters from an infinite alphabet, on the k tapes of M'. The simplest way to do this is to represent each character a_i of M by a string of characters from a fixed alphabet, such as a(binary for i) or abⁱ. Then a step of M is simulated by reading this character, transferring it to a worktape, running the given Turing machine to compute δ(q,a_i), and writing the string representation of the new character to the memory tape of M'. (This last may require moving text to make room.) Once we show how to simulate each step of M by finitely many steps of M' (including the steps of the δ-machine), we know that the entire computation of M' halts after finitely many steps.
Note that even if we assume that M and the δ-machine are polynomial-time TM's, we can't guarantee that this M' is polynomial time. If the transition function can change a letter a_i to another letter a_j whose string representation is twice as long, in polynomially many steps we could create a letter with an exponentially long string representation.
Question 3 (10): Let n be a positive integer. An n × n array of numbers, each number in the set {1,...,n}, is a Latin square if every row and column contains all n possible numbers. (The popular Sudoku and Kenken puzzles ask you to form Latin squares with certain additional properties.) Let PARTIAL-LATIN (or PL) be the set of partially filled n × n arrays (for any possible n) that can possibly be completed to a Latin square. Let UNIQUE-PARTIAL-LATIN (or UPL) be the set of partially filled n × n arrays that can be completed to a Latin square in exactly one way.
Prove that PL is in the class NP and that UPL is in the class PH. What is the size of your input string, as a function of n?
Let X be the partially filled array given as input. A certificate for PL consists of a complete array Y, and the verifier M(X,Y) returns 1 if Y is a Latin square and every filled entry of X is the same in Y. Testing the Latin square property takes O(n³) operations on numbers of O(log n) bits, and testing the agreement takes only O(n²) such operations, so the verifier is polynomial time and X is in PL if and only if ∃Y: M(X,Y) = 1.
A partially filled array X is in UPL if and only if there exists a Latin square Y that agrees with the filled entries of X, and there does not exist any complete array Z that is a Latin square, agrees with X, and is not identical to Y. We can make a poly-time verifier M(X,Y,Z) that outputs 1 if and only if Y is a Latin square extension of X and either Z is not a Latin square extension of X or (Y = Z) -- this is no more complicated than the verifier above. Then X is in UPL if and only if ∃Y: ∀Z M(X,Y,Z) = 1. We have thus shown UPL to be in the class Σ^p₂, which is contained in the class PH.
The input is an array with up to n² numbers, each requiring O(log n) bits to write since they are in {1,...,n}, so the input size is O(n² log n).
Question 4 (10): The language MOD-POWER consists of those strings a^mb^xcⁱd^y such that the integers xⁱ and y are congruent modulo m. (Recall that this means that m divides the integer xⁱ - y, or that the remainders xⁱ%m and y%m, in Java notation, are equal.) Prove that MOD-POWER is in the complexity class L = DSPACE(log n). (Hint: Doing all your arithmetic modulo m saves space.)
We first count the number of each kind of letter in the input with a binary counter, so that we have the numbers m, x, i, and y on the worktape. Since each of these numbers is at most n, they can each be represented in O(log n) bits. We then have to compute xⁱ modulo m and compare it with y modulo m.
We saw in lecture that we can multiply positive integers in logspace even if they have polynomially many bits, and we can reduce a number modulo m in logspace if it has O(log n) bits, by just repeated subtracting m from it until it is less than m. So we can get y modulo m easily, and we can get xⁱ modulo m by the pseudocode:
```
   z = 1;
   for (j = 0; j < i; j++)
      z = (z*x) % m;
 
```
There are faster ways to carry out the computation but we are only worried about space. About the only wrong way to do this would be to try to compute xⁱ first as an integer, then reduce it modulo m -- this could require up to i(log n) bits for the intermediate results, even using repeated squaring.
Question 5 (10): The COMPETITIVE FACILITY LOCATION (or CFL) problem takes as input an undirected graph G, with a positive integer weight for each vertex, and a positive integer t called the target. The vertices represent possible locations for facilities, the weight of a vertex represents the value of a facility at that location, and an edge (u,v) means that vertices u and v cannot both have a facility. Two players will take turns each placing a facility on a vertex, where they may choose any vertex that does not have an edge to a vertex with an existing facility.
The input (G,t) is in the language CFL if and only if there is a strategy for the first player that lets her place facilities with total weight at least t, no matter how the second player plays.
Prove that the language CFL is in the class ATIME(p(n)) for some polynomial p. In what deterministic complexity class can you place this language?
We define an ATM game where White and Black alternately name vertices by guessing sequences of O(log n) bits. White wins the game if Black is the first to pick a vertex connected by an edge to a previously picked vertex, or if no edge violation occurs and her node weights total to at least t at the end. The game takes polynomial (at most n log n) time for the guesses, and the winner can clearly be checked in polynomial time by checking each pair of chosen vertices against the input graph and by adding up White's weights.
Clearly White wins the ATM game under optimal play if and only if the first player has a strategy to get total value at least t if and only if the input (G,t) is in the language CFL.
We know ATIME(p(n)) is contained in PSPACE, since a deterministic machine could evaluate the entire game tree for the game using recursion -- the depth of the recursion would be the number of moves (polynomial) and each step of the recursion would require only polynomial space on the stack. The language is probably not in the class PH, and is certainly not shown to be there by this ATM game, since the game has up to O(n) alternations between White and Black moves.
Question 6 (25): A simple path in a directed graph is a path that never visits any vertex more than once. Here are two problems involving directed graphs. LONG-PATH is the set of pairs (G,k) such that G is a directed graph and there exists a path of length at least k in G. LONG-SIMPLE-PATH is the set of pairs (G,k) such that G is a directed graph and there exists a simple path of length at least k in G.
- (a,5) Prove that LONG-SIMPLE-PATH is in the class NP. (What happens if k is very large?)
  A certificate showing a pair (G,k) to be in LONG-SIMPLE-PATH is simply a listing of the sequence of vertices in a path of length k. If k ≤ n-1, a poly-time verifier can check that the sequence is a valid path in G, is simple, and has length k. If k > n-1, there can be no simple path with k edges (as it would have k+1 > n vertices on it), and we can reject the input whatever certificate is offered. Clearly (G,k) is in LSP if and only if a valid, polynomial-length certificate exists.
- (b,10) Prove that LONG-SIMPLE-PATH is NP-hard and thus NP-complete.
  We can reduce the language DHAMPATH, proved in [AB] to be NP-complete, to LSP. An input to DHAMPATH is a directed graph G, and G is in DHAMPATH if and only if there exists a simple path in G containing all n vertices and thus having exactly n-1 edges. So G is in DHAMPATH if and only if the pair (G,n-1) is in LSP, and clearly this reduction can be computed in polynomial time. We have that DHAMPATH ≤_p LSP, and thus LSP is NP-hard.
- (c,10) Prove that if the number k is given in unary, LONG-PATH is in the class NL.
  If n is the size of the input string (G,k), we know that k ≤ n and that G has at most n vertices. So we can store both a counter up to k and a constant number of vertex names on our logspace worktape. Define a valid read-once certificate to be a path of length k, given as a sequence of k+1 vertices. We can test that the path is valid in a read-once way by checking that each vertex has an edge to the next vertex (this requires having both on a worktape at once), and counting up to k to check the path length. In [AB] it is shown that a language is in NL if and only if we can define a read-once certificate language that can be checked in logspace.
  The restriction that k is in unary is actually not needed to solve the problem -- that is, LONG-PATH with k in binary is still in NL. If k is larger than the number of vertices in G, then a path of length k exists if and only if a cycle exists, and we could let any cycle be a certificate in that case.
Question 7 (25): These questions all involve the language TQBF, the set of all true quantified boolean formulas. A quantified boolean formula has the form:
Q₁x₁:Q₂x₂:... Q_mx_m: φ(x₁,...,x_m),
where each Q_i is a quantifier (∃ or ∀), each x_i is a boolean variable, and φ is a formula using boolean variables, constants 0 and 1, and operators ∧, ∨, and ¬.
- (a,5) Explain why the language TQBF is hard for the class PSPACE under poly-time reductions.
  Given any deterministic (or even nondeterministic) TM M running in space S(n) and an input x of size n, we can imagine (not write down) a configuration graph with 2^O(S(n)) nodes, such that there is a path from the start node to the accept node of this graph if and only if x is in L(M). We can then define a quantified boolean formula expressing this path property using middle-first search -- a path of length 2z from vertex u to vertex v exists if and only if there exists a node w such that there are paths of length z both from u to w and from w to v. We recursively rewrite the formula, reusing variables to keep the length of the formula polynomial, until the base case refers to an edge of the configuration graph and we can express this with a quantifier-free formula. The function from x to this formula is the poly-time reduction, as the formula is true if and only if the path exists.
- (b,10) Prove that the complexity classes P^TQBF and NP^TQBF are equal. These classes are defined in terms of ordinary oracle machines, where the oracle query tape may be used as read/write space.
  The two classes are equal because each is equal to PSPACE. To show that PSPACE is contained in P^TQBF, we simply quote the result of part (a) and note that a poly-time deterministic oracle machine can take x, compute a single TQBF query using the reduction there, and get the answer to that query from the oracle. To show that NP^TQBF is contained in PSPACE, note that we can simulate every possible computation path of the NP machine using polynomial space, since the choice sequence of the NP machine and the space it uses on each path are each polynomial. Our simulation must also simulate the oracle that the NP machine uses, but we can do this in PSPACE as well -- given a poly-size quantified boolean formula, we can evaluate it recursively, by substituting both 0 and 1 for each quantified variable, using polynomial total space. Since P^TQBF is obviously contained in NP^TQBF, we have shown all three classes to be equal.
- (c,10) Let M be a polynomial-time nondeterministic oracle machine with oracle for the language SAT. (Thus L(M) is an arbitrary language in the class NP^SAT.) Prove that L(M) ≤_p TQBF, describing the polynomial-time reduction.
  Modify M to create a non-oracle nondeterministic TM M' as follows. M' simulates M step by step except that whenever M makes an oracle query, M' guesses the result of the query and continues computing as though that were the answer from the oracle. Then the input x is in L(M) if and only if:
  There exists a run r of M' such that every oracle query guessed true in r is a satisfiable formula and every oracle query guessed false in r is an unsatisfiable formula, or
  ∃ r: ∀φ ((φ guessed true in r) → (∃z: φ(z)=1) ∧ (φ guessed false in r) → (∀z: φ(z) = 0))
  This can clearly be expressed as a quantified boolean formula -- a Σ₂ formula with a little more work.
Question 8 (25): Recall that if A and B are n × n boolean matrices, their product AB is defined to be the n × n matrix C such that each entry C_i,j is the OR, over all k from 1 to n, of A_i,k ∧ B_k,j. If A is an n × n boolean matrix, the k'th power of A, called A^k, is defined by the rule A⁰ = I (the identity matrix) and A^k+1 = A^kA.
- (a,10) Let G be a directed graph and let A be the adjacency matrix of G, except that each entry A_i,i is 1. Prove that for any non-negative integer k, there is a path in G of length at most k from vertex i to vertex j if and only if A^k_i,j = 1. (Here A^k_i,j is the (i,j) entry of the matrix A^k.)
  The base case is k=0 since the statement is for all non-negative integers k. A⁰_i,j is defined to be 1 iff i=j, and a path of length 0 exists from i to j iff i=j.
  For the inductive hypothesis, assume that for a fixed k and any i and j, A^k_i,j = 1 iff there is a path of length k from i to j. By the definition of paths in directed graphs, there is a path of length at most k+1 from u to v iff there exists a vertex z such that there is a path of length at most k from u to z and either z = v or there is an edge from z to v. By the inductive hypothesis and the definition of A, then there is such a path iff there exists z such that A^k_u,z = 1 and A_z,v = 1. But this is exactly the definition of when A^k+1_u,v = 1. This completes the induction and the proof.
- (b,5) Let BOOLEAN-MATRIX-POWER (BMP) be the language {(A,k,i,j): A^k_i,j = 1}. Assume, if you like, that k is given in unary. Prove, using part (a) as needed, that BMP is in the class NL.
  From (a), we can show BMP to be in NL by showing that the language X = {(G,k,i,j): there exists a path of length at most k from i to j in G} is in NL. Many of you incorrectly assumed that this was the language called PATH, and shown to be NL-complete, in [AB]. But PATH is the language {(G,i,j): there exists a path (of any length) from i to j in G}.
  So we have to show directly that this language X is in NP. This is essentially done in the solution to Question 6c above -- we guess a path and then verify it as a read-once certificate in logspace -- with the exception that our guessed path must now start at i and end at j.
  Again, the restriction to unary k is not needed to put BMP in NL. If there is any path of any length from i to j, there is a path of length at most n-1. Therefore we can insist that all certificates are paths of length at most n-1, and thus polynomial size even if k is very large.
- (c,10) Prove that BMP is NL-hard (under logspace reductions) and is thus NL-complete.
  We need to reduce a known NL-hard problem to BMP, and our only candidate from the book is PATH = {(G,i,j): there exists a path from i to j in G}. By part (a), we know that the triple (G,i,j) is in PATH iff the 4-tuple (A,n-1,i,j) is in BMP, where A is the (modified) adjacency matrix of G and n is the number of vertices in G. (Here using n in place of n-1 is still correct, where it wasn't in question 6b.) A path from i to j exists iff such a path exists with at most n-1 edges. This reduction is clearly computable in deterministic logspace.

Last modified 17 March 2010