CMPSCI 601: Theory of Computation

Practice Final Exam Solutions, Spring 2010

David Mix Barrington

Posted 3 May 2010

Directions:

Answer the problems on the exam pages.
There are five short problems, for ten points each, and three long problems for 25 points each. Attempt all the short problems and only two of the long ones -- the maximum score is thus 100. If you attempt all three long problems I will take the scores of the best two. Likely scale is A = 90, B = 60 but the actual scale will be determined after I grade the exam.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 25 points
  Q7: 25 points
  Q8: 25 points

  Total: max 100 points

Minor corrections to exam added on the evening of 29 April.

Exam text is in black, solutions in blue.

Question 1 (10): A parallel Turing processor is a network of f(n) different Turing machines, each of which has one worktape that is public. On a given time step it may read one cell of the public worktape of any other machine, by writing the number of the cell and the number of the machine on an address tape. One machine is designated the lead machine, and it gets the input string at the beginning of the computation and produces the output at the end of the computation. Define parallel-P to be the class of languages X such that there exists a parallel Turing processor M with f(n) processors and time bound t(n) on input of length n, where f and t are polynomials, such that L(M) = X. Prove that parallel-P = P.
First, P is clearly contained in parallel-P because a PTP could consist of a single processor and simulate an ordinary poly-time TM. So we must prove that a PTP can be simulated by an ordinary TM. To simulate one parallel time step on the PTP, the ordinary TM must execute one step on each of the f(n) individual machines, and before it can do this it must look up the public-tape information that each machine is accessing on that step. We can build a single tape that holds all the worktape contents of all the machines, as each machine can use at most O(t(n)) space and f(n)t(n) is still a polynomial. We have our single machine gather the public information needed by each machine, record it all in one place, and then carry out the step on each machine. (It's important that all the information be recorded before any steps are executed, as the steps on different machines are simultaneous.) We might need as much as O(f²(n)t(n)) steps to gather the information, as we might have to search the entire memory of size O(f(n)t(n)) for each processor. This is the dominant part of executing a single parallel step, so our total time for t(n) parallel steps is O(f²(n)t²(n)), still a polynomial. time
Question 2 (10): Let DCYCLE be the set of directed graphs that have a cycle, where a cycle is any path of one or more edges from a vertex to itself. Prove that DCYCLE is complete for the class NL under log-space reductions. (You may quote the fact that the language PATH is NL-complete, but for your proof you will probably want a slighty different language which may require revisiting that completeness proof.)
I think the easiest way to prove this is to use the fact that DAG-PATH, the restriction of the PATH problem to directed graphs with no cycles, is also NL-complete. To see this, consider the reduction from an arbitrary NL language to PATH and first put a clock on a worktape of the NL machine, so that it can never visit the same configuration twice in a single computation. This makes its configuration graph acyclic, so that the reduction to PATH becomes a reduction to DAG-PATH.
Now we just have to reduce DAG-PATH to DCYCLE. Given a DAG G and vertices s and t, create a graph f(G,s,t) with the same vertices, retaining the edges of G and adding an edge from t to s. (If G has an edge from t to s already, we know that the answer to PATH(G,s,t) is false because G is acyclic, so our reduction returns some fixed graph with no cycle.) Since G had no cycle, f(G,s,t) can only have a cycle including the new edge, and this must include a path from s to t that existed in G. Contrariwise, if G has a path from s to t it is clear that f(G,s,t) has a cycle. So we have reduced DAG-PATH to DCYCLE, proving that DCYCLE is NL-hard. It is clear that DCYCLE is in NL, because an NDTM can start at a guessed vertex s and nondetermistically explore a path, accepting only if it revisits s. Hence DCYCLE is NL-complete.
Question 3 (10): Recall that PARITY is the set of binary strings with an odd number of ones. Prove that PARITY is in the class ATIME(log n), where of course the alternating Turing machine has random access to its input. (This means that it can write a number i on its address tape and then access input bit x_i.)
We define a game that can be played between White and Black using O(log n) bit moves, where White wins under optimal play iff the input string w is in PARITY. At any point in the game the leading bits u of an input address are written on a worktape, defining a substring of the input. (For example, if 110 were written on the input the substring would be bits 110000...000 through 110111...111.) The state of the machine records whether White is claiming that the substring is or is not in PARITY. Originally u is the empty string, the substring is all of w, and White is claiming that the substring is in PARITY. A White move is a bit 0 or 1 indicating whether White claims that the substring defined by u0 is in PARITY. A Black move is a bit to be appended to u -- if Black moves 0 then the game continues with White's claim about u0, and if Black moves 1 then the game continues with the claim about u1 that follows from White's claims about u and u0. The game ends when u is a complete address of a letter in w, whereupon White wins iff her final claim is correct. (This is thus a one-look ATM as well.) If White is initially telling the truth, she can win by continuing to tell the truth. If White is initially lying, Black has a winning strategy by continually challenging the one of White's two claims that is false. The game takes two bits per round, and the number of rounds is the ceiling of log(n) where n is the length of w.
Question 4 (10): Recall that a pseudorandom generator of stretch s(n) is called secure if for any probabilistic polynomial-time function A, the probability that A(x) = 1, for a string x of length s(n) generated from a uniform random seed of length n, differs from the probability that A(y) = 1 for a uniform random string of length s(n) by a negligible function. Prove that no secure pseudorandom generator can have the property that its output depends only on the first O(log n) bits of its seed.
Fix a string x that is generated from some seed z. Since all seeds that agree with z on the first c log n bits (for some c) also produce x, the probability of x in the first distribution is at least n^-c. The probability that x is generated in the second distribution is exactly 2^-s(n). So define A so that A(y) = 1 iff y = x. The probability that A(x) = 1 is larger for the first distribution by at least n^-c - 2^^-s(n), a non-negligible function. Hence this generator is not secure under the given definition.
Question 5 (10): Recall that a MOD-3 gate takes zero or more boolean inputs and outputs 1 iff the sum of its inputs (as integers) is not congruent to 0 modulo 3. Consider circuits with only MOD-3 gates that compute the parity function on n boolean inputs. Describe such a circuit of depth O(log n), and show that any such circuit must have depth Ω(log n).
For the upper bound, recall that the PARITY language is decided by a circuit of binary XOR gates of depth O(log n). We can build an XOR gate from a MOD-3 gate (of in-degree 3) because if x and y are bits, x+2y is congruent to 0 modulo 3 iff x = y, and so MOD-3(x,y,y) = XOR(x,y). So there is a MOD-3 circuit of depth O(log n) to decide membership in PARITY for inputs of length n.
For the lower bound, we use the degree complexity measure from the proof of Smolensky's Theorem. Given any circuit of MOD-3 gates alone, we define a polynomial for each wire defining the function of the inputs that gives the bit transmitted on that wire. Since a MOD-3 gate computes the square of the sum of its inputs, the degree of its output is at most double the degree of its highest-degree input. Thus by a simple induction, the degree of the output of a depth-d circuit of just MOD-3 gates is at most 2^d. Since the PARITY function (the decision function for the PARITY language) is -1 plus the product for all i from 1 to n of (1 + x_i), it has degree n and thus d must be Ω(log n) for 2^d to be at least n.
Question 6 (25): Let E(k, p, 1^t) be the following partial encryption function. (It is a partial encryption function because for given inputs it may or may not produce a ciphertext.) E interprets the binary string k, if possible, as a Turing machine. It then runs k on the plaintext binary string p for at most t steps, and outputs the string, if any, that is the result of this computation.
The language VALID is the set of tuples (c, k, 1^a, 1^b) such that there exists a binary string p of length a such that E(k, p, 1^b) = c.
- (a,5) Argue that VALID is in the class NP.
  Let the certificate for (c, k, 1^a, 1^b) be a string p of length a such that E(k, p, 1^b) returns c. To check that a certificate is valid, we run k (as a TM) on p for b steps and accept if it outputs c -- this is polynomial time in the size of the input tuple (c, k, 1^a, 1^b, p). Clearly such a certificate exists for a tuple (c, k, 1^a, 1^b) iff it is in VALID, from the definition above.
- (b,10) Prove that if A is any language in the class NP, then A ≤_p VALID.
  Let A be an arbitrary language in NP and let B be a language in P such that for any string x of length n, x is in A iff there exists a string u (of length q(n) where q is a fixed polynomial) such that (x,u) is in B. For any string x, let M_x be a machine that outputs 1 on an input u iff (x,u) is in B. Our reduction will map x to the tuple (1, M_x, 1^p(q(n))) where p(n) is the polynomial time bound on M -- by the various definitions this tuple will be in VALID iff x is in A.
- (c,10) Let VALID' be the set of tuples (c, k, 1^a, b) where b is a binary number and E(k, p, 1^b) = c. What is the complexity of the language VALID'? Justify your answer.
  VALID' is EXP-complete. To check whether a tuple (c, k, 1^a, b) is in VALID we can run k on all inputs of length a for b steps each -- this takes O(2^ab) time which is exponential in the input size. If A is an arbitrary language in EXP, the language of a TM M with time bound 2^p(n), we can test whether x is in A by testing whether the tuple (1, M_x, ε, 2^p(n)) is in VALID, where here M_x is a TM that runs M on x given the empty string as input.
Question 7 (25): A boolean circuit is levelled if the non-input gates can be divided into sets L₁, L₂,..., L_d, where the inputs to any gate in the set L_i are either inputs to the circuit (i.e., variables or negated variables) or are gates in the set L_i-1. The width of the circuit is the maximum number of gates in any set L_i.
- (a,10) Let A be any language in DSPACE(log n). Argue that A is decided by a family C₀, C₁,... of boolean circuits such that each circuit C_n has size polynomial in n and width O(log n).
  Consider a TM M for A that runs in time p(n) and uses space c log n on input x of length n. Build a circuit that has a gate G(t,i,a) that will output 1 iff the i'th letter of the configuration of M at time t is a. (So a might be a tape letter or a state of M.) By our proof of the Cook-Levin Theorem, we know that G(t,i,a) can be computed from O(1) values of the form G(t-1,j,b) and one bit of the input, and this computation can be done by a circuit of O(1) size and depth. Our circuit has polynomial size and can be levelled, with O(1) levels for each time step t. A given level contains O(1) gates for each possible i, and so has O(log n) gates. The output gate is G(p(n),1,q_acc).
- (b,5) Give an example of a language that has a circuit family as in part (a) but is not in DSPACE(log n).
  The language {1ⁿ: the n'th Turing machine (in some standard order) halts on blank input} is undecidable and hence is not in DSPACE(log n). But we can build two circuits for any fixed n such that one of them is correct, and both meet the conditions in part (a). If the n'th TM halts, our circuit of size n and width 1 computes the AND of the n input bits with n binary AND gates, each accessing one of the input bits. If the n'th TM does not halt, we have a circuit of size 1 that outputs 0 on any input of length n. Of course it is undecidable which circuit is correct, but there exists a correct circuit family meeting the conditions.
- (c,10) Argue carefully that if A has a circuit family as in part (a) and that family is log-space uniform, then A is in DSPACE(log n).
  Our log-space decider for A is the composition of a TM that computes the pair (C_n,x) from any input x of length n and a second TM that evaluates C_n on x given such a pair where C_n meets the conditions. It suffices to prove that the second machine is log-space, because the first one is by hypothesis and log-space functions are closed under composition.
  The second machine operates by computing the values of all gates on each level t, remembering them on a worktape until all the values for level t+1 have been computed. Remembering up to two levels takes O(log n) space by the width assumption, and the computation can be carried out in logspace because we just have to look up values from the previous level or the input to compute each gate of the new level -- we may need a few counters for this.
Question 8 (25): Let g(x₁,...,x_n) be a polynomial of degree d (where d is polynomial in n) over the field Z_p, where p is a prime of between n and 2n bits. Let h be the sum, over all 2ⁿ possible boolean settings of the variables x₁,...,x_n, of g(x₁,...,x_n). Describe a protocol whereby a computationally unbounded prover can convince a probabilistic verifier that h is equal to some number K, if this is true. Your protocol should have completeness 1 and soundness at most 1/2. Argue that your protocol is correct.
For any string w of length up to n, let h_w be the sum of g(x₁,...x_n for all strings x that have w as a prefix. (Thus our original h is h_ε.) Clearly h_w is the sum of h_w0 and h_w1 for any w.
For any w of length i-1, let g_w be the function of x_i that is the sum of g(w₁,...,w_i-1,x_i, x_i+1,...,x_n) for all values of x_i+1,..., x_n. Clearly h_wb = g_w(b) for b = 0 or 1. Also note that each of these g_w polynomials has degree at most d because it is a sum of g's with some values substituted for and hence a sum of polynomials each of which has degree at most d in x_i.
The proof proceeds by the prover maintaining claims for the value of h_w, where w is a successively longer string of values in Z_p. Once w has length n, h_w can be evaluated by substituting values into g and the verifier will accept iff it has the claimed value.
The prover advances his claim about h_w by giving what he claims are the coefficients of g_w as a degree-d polynomial in x_i. The verifier checks that the prover's polynomial s satisfies the property that s(0) + s(1) is the claimed value for h_w and is thus consistent with the prior claim. The verifier then selects a uniform random value r from Z_p and the prover's new claim is that h_wr = s(r).
If the prover is originally telling the truth he has a proof strategy that will always cause the verifier to accept, which is simply to tell the truth about each g_w -- this will lead to his making true claims about each h_w. If the prover is lying about the original claim, he must select an s that is not equal to the true g_w, or otherwise the verifier will detect the inconsistency and reject. But a different polynomial of degree at most d can agree with the true one on at most d values within Z_p. Thus with probability at least 1 - d/p, the prover's new claim will also be incorrect. In n rounds, the prover's chance of getting a true claim "by accident" in some round is at most dn/p, which is less than 1/2 for sufficiently large n because p is bigger than 2ⁿ. Unless this happens, the verifier will reject.

Last modified 3 May 2010