CMPSCI 601: Theory of Computation

Solutions to Practice Midterm Exam, Spring 2010

David Mix Barrington

6 March 2010

Directions:

Answer the problems on the exam pages.
There are five short problems, for ten points each, and three long problems for 25 points each. Attempt all the short problems and only two of the long ones -- the maximum score is thus 100. If you attempt all three long problems I will take the scores of the best two. Likely scale is A = 90, B = 70 but the actual scale will be determined after I grade the exam.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 25 points
  Q7: 25 points
  Q8: 25 points

  Total: max 100 points

Question 8 modified 4 March 2010 Error in Question 8 (c) corrected 10 March 2010

Question text is in black, solutions in blue.

Question 1 (10): A clocked Turing machine is one that is guaranteed to halt in some polynomial number p(n) of steps on any input of length n. Let X be the following language: {(M,x): M is a clocked TM and there exists a string y such that M(x,y) = 1}. Prove that X is an undecidable language. (Hint: Note that y is part of the input to M, and that there is no restriction on the length of y.)
We know that the language U = {(M,x): M is a TM and M(x) = 1} is undecidable -- we show that X is undecidable by reducing U to X. Given an input (M,x) for U, we let the string f((M,x)) be the pair (M_check,(M,x)) where M_check is the following specific clocked TM. M_check takes three inputs M, x, and u, and checks whether u is a series of snapshots of a an accepting computation of M on x. As we saw in the proof of the Cook-Levin Theorem, whether u is this series of snapshots can be determined by evaluating a CNF formula that can be produced in polynomial time (in the length of u) and can also be evaluated in polynomial time. So M_check runs in a time that is polynomial in the length of its input, though this may be a very long running time in terms of the length of x.
Question 2 (10): In Arora-Barak's proof of the Cook-Levin Theorem, they construct a CNF formula whose variables specify a series of "snapshots" and that is true if and only if the snapshots represent a valid computation. Part of the formula checks each snapshot z against the latest prior snapshot when the head was in the same position as it was for z. What must it check and why? And how does it determine which prior snapshot to check?
It must check that the character that the machine M reads in snapshot z is the same one that was written by the machine at the time of snapshot z_prev, according to the machine's state and characters read at that time. For the series of snapshots to represent a single valid computation of M, this must be the case.
The reduction constructing the formula can determine the time for that prior snapshot because M is an oblivious TM, meaning that its head movement is the same for any input of a given length. The input to M is a pair (x,u), where x is the string that we are checking for membership in the NP language, and u is a certificate for its membership. The length of u is fixed as a function of the length of x, and thus we know the length of the input to M although we do not know any particular string u. We thus can simulate M on the pair (x,u) for any u that has the right length, and the head movement will be the same. We can thus determine the right snapshot from this simulation.
Question 3 (10): The facility location problem takes as input an undirected graph with a positive integer weight for each vertex, and a positive integer t. The vertices represent possible locations for facilities, the weight of a vertex represents the value of a facility at that location, and an edge (u,v) means that vertices u and v cannot both have a facility. The output is a boolean that says whether it is possible to place facilities of total value at least t without any conflicts. Prove that this problem is NP-complete.
The language FLP = {(G,t): weighted graph G has an independent set of weight at least t} is clearly in NP -- a certificate is the set of nodes and checking the certificate consists of making sure it has no edge from one vertex in the set to another, and that the total weight of the set is at least t.
We can most easily show FLP to be NP-complete by reducing the known NP-complete language IND-SET to it. IND-SET is the language {(G,k): G is an undirected graph and G has an independent set of size at least k}. If we let f(G,k) be (H,k) where H is a weighted undirected graph with the same vertices and edges as G and with every vertex given weight 1, then clearly (G,k) is in IND-SET iff (H,k) is in FLP.
Question 4 (10): Let A and B be two n by n matrices with boolean (0 or 1) entries. The matrix product AB is the n by n matrix C, where each entry C_i,j is the OR, over all k from 1 to n, of A_i,kB_k,j. (Note that this is the same as the product of A and B under ordinary matrix multiplication using boolean arithmetic, with 1 + 1 = 1.) Prove that a deterministic Turing Machine can take input A and B and produce AB (on a write-only output tape, say) using space O(log n).
For every pair (i,j) with 1 ≤ i,j ≤ n, in row-major order, our log-space machine will compute the (i,j) entry of AB and write it to the tape. For any fixed i and j, we know that AB_i,j is the OR of (A_i,k AND B_k,j) for k ranging from 1 to n. We can compute this by computing this AND for each k until or unless we get a value of 1, writing 1 to the tape and stopping the loop if we ever do and writing 0 to the tape if we complete the loop.
All these computations require us to store the three numbers i, j, and k on worktapes, and to reference entries of A or B given by some two of these three numbers. Since the numbers i, j, and k are each log n bits and there is no other memory needed, this is an O(log n) space computation.
Question 5 (10): Let G be a directed graph and v be a vertex in G. If k is any non-negative integer, the function NSIZE(G,v,k) gives the number of vertices w in G such that there is a path of length at most k from v to w. Outline a proof that the language {(G,v,k,m): NSIZE(G,v,k) = m} is in the class NL = NSPACE(log n).
This is of course a part of the proof of the Immerman-Szelepcsenyi Theorem. We can define a nondeterministic computation that takes (G,v,k) as input and successively guesses N(G,v,1), N(G,v,2),..., up to N(G,v,k) in such a way that if any value guessed is incorrect, the computation rejects. We need to define this computation so that if N(G,v,i) is correct, then N(G,v,i+1) is also correct or the computation rejects.
What the nondeterministic computation does is this: it sets a counter to zero and then considers each vertex w of G in turn. For each w it guesses whether there is a path of at most i+1 edges from v to w. If it guesses "yes" it verifies the guess by giving the path. If it guesses "no" it verifies this by listing N(G,v,i) different nodes u with paths of length at most i from v to u, such that none of these nodes have edges from u to w. If the guessed value of N(G,v,i) is correct, then this is all the nodes with paths from v of length at most i, and thus w cannot have a path from v of length i+1. Each time the computation guesses "yes" it increments the counter, and at the end of the loop the value of the counter must be the guessed value of N(G,v,i+1) or the computation rejects.
Question 6 (25): A one-d cellular automaton is a linear array of processors stretching arbitarily far in both directions, and operating synchronously. Each processor has the same state set Q, and on each time step each processor changes its state from its current state q to a new state δ(p,q,r), where p is the state of the processor to its left and r the state of the processor to its right. One of the states is a halt state, and the entire automaton stops when any processor enters the halt state. We start the automaton on a binary input string w = w₁...w_n by placing each letter w_i in processor i. That is, the initial state of processor i is 0 if w_i = 0, 1 if w_i = 1, and 2 if there is no letter w_i.
- (a,5) Argue that the language {(M,w): Cellular automaton M eventually halts on input w} is also the language of some Turing Machine (i.e., it is Turing recognizable).
  At time step 0, only processors 1 through n (n processors in all) have a state other than 2. By induction, at time t the only processors that can be affected by the contents of w are those from 1-t through n+t, n+2t processors in all. The rest are in whatever state results from a line of processors all in state 2 after t steps.
  A TM can set up a tape with the states of the n processors containing w in n successive tape cells, then on each time step t make a pass over cells 1-t through n+t, updating so that cell i matches the proper state of processor i in M. This takes time O(n+2t) for the pass, and thus time O(nt + t²) for the first n passes. This simulation is by a TM with a tape infinite in both directions, but this can be simulated by an ordinary TM with quadratic time overhead according to a construction in the book.
  If a cell of M eventually reaches a halt state, the simulation will discover this and halt. If this never happens, the simulation will continue computing forever.
- (b,10) Argue that if M halts on w in p(n) steps, where p is a polynomial, then your simulating Turing machine from part (a) will halt within q(n) steps, where q is some other polynomial.
  The description of the simulation above shows that if M halts in p(n) steps, the simulation will halt in O(np(n) + p²(n)) steps. This is a polynomial in n if p(n) is.
- (c,10) Argue that if X is a language in P, there is a cellular automaton M such that on any input w of length n, M will halt within r(n) steps (where r is some polynomial) and the state of processor 0 when M halts will be 0 if w is not in X and 1 if w is in X.
  Let M' be a poly-time TM deciding the language x. By a construction from the book, we can assume that M' is a single-tape TM (with a read/write tape) and still poly-time. We create M so that it simulates the action of M' on its tape cell by cell, with one processor of M for each cell of M'. A processor's state records the letter written in the corresponding cell, whether the head of M' is in that cell, and what state M' is in if the head is there. The transition function of M keeps each processor state the same except for the three processors where the head is (at the head, just to the left, and just to the right) -- some of these states will change according to the action of the head of M'. There is just one time step of M needed to simulate each step of M', and we know that M' is poly-time, so M terminates in a polynomial number of steps.
Question 7 (25): In this question you will prove a special case of the (deterministic) Space Hierarchy Theorem, that DSPACE(log n) is properly contained in DSPACE(log² n).
- (a,10) Let M be a deterministic TM with k states and a tape alphabet of size m that uses space at most c log n on any input of length n. Explain briefly why a universal Turing machine, with a single state set and tape alphabet, can simulate M on inputs of size n using at most c' log n space, where c' is a constant depending only on k, m, and c.
  We use a string of size O(log n), a constant, to store each letter on the tape of M, use O(log k) cells of worktape to store the state of M, and use some constant amount of space to store the description of M. (We need km entries in a table, each of size log m + log k + 1, to store the transition function of m.) We assume that the read-only input is re-encoded into the universal TM's alphabet but remains read-only so it doesn't count against the space bound. Thus the total space usage is (log k)(c log n) plus a constant, and a large enough c' will ensure that c'log n will be bigger than this.
- (b,5) Let M₁, M₂,... be a listing of all possible Turing machines (with read-only input and a single work tape), such that each machine appears infinitely often in the list. Describe a Turing machine D with space bound O(log² n) whose language is not in DSPACE(log n). (Justifying this latter claim is part (c).)
  On an input string of length n representing the non-negative integer i, D will simulate the TM M_i until it either halts or attempts to use more than (log n)^1.5 space. If M_i halts, D will also halt and give a different answer. If the space bound is violated, D will halt and give the answer 0. Clearly D uses O(log² n) space on inputs of length n, as desired.
- (c,10) Argue carefully that if any machine M_i always halts using at most c log n space for some c, your machine D cannot decide the same language as M_i.
  Suppose that M_i is such a machine. Consider all the numbers i' such that M_i and M_i' are equivalent machines, meaning in particular that they give the same output and use the same amount of space on any given input. Remember that there are infinitely many numbers i' such that this is the case. For large enough i', the space usage of M_i' on input i', which is at most c log (log i'), will be below D's space usage on input i', which is log^1.5 (log i'). Thus D's simulation will finish, D will halt with a different output from that of M_i', and L(D) ≠ L(M') = L(M).
Question 8 (25): A space-oracle TM is an oracle machine with a write-only oracle tape. If it has an oracle for a language B, it can write a string w onto its oracle tape and then determine in one step whether w is in B, whereupon the oracle tape is cleared. It also has a read-only input tape and a work tape, and only the work tape counts as space. If B is any language, we define L^B to be the languages decided by deterministic oracle machines for B using space O(log n), and NL^B to be the languages of nondeterministic oracle machines for B using space O(log n).
- (a,10) Argue that if B is in the class L, then L^B = L.
  Let M be a logspace oracle machine with oracle for B -- we need to simulate M by a logspace machine M' with no oracle. Let M'' be the logspace machine whose language is B. M' proceeds by simulating M, but when M makes an oracle call, M' must pause and simulate M'' in order to determine the result of the oracle call, which is the answer as to whether some string w is in B. This is somewhat complicated, because M'' would have the string w on its read-only input tape and M' cannot afford to use a work tape to store w as w might well be polynomial length in n.
  So what M' will do is record its state after each oracle call, so that it can reset to that state many times to produce bits of w whenever they are needed by M''. That is, when M'' needs a bit of w, M' resets its simulation of M to the last time the oracle tape was cleared, starts a counter, runs M until the needed bit of w is produced, and gives that bit to M''. Once M'' is finished, M' can take the answer to the oracle call and continue its simulation of M past that point. The space needed to do all this is O(log n) to simulate M, another O(log n) to record one prior configuration of M, plus O(log |w|) = O(log n) to simulate M'' -- this is O(log n) total.
- (b,5) Argue that if B is in the class NL, then L^B ⊆ NL.
  This proceeds as in part (a), except that M'' is now a nondeterministic logspace machine which can accept a string w iff it is in the language B. By the Immerman-Szelepcsenyi Theorem, there exists another nondeterministic logspace machine M''' whose language is the complement of B -- it can accept w iff w is not in B.
  M' will use its own nondeterminism in two ways -- for each oracle query for a string w, M' will guess whether w is in B, and then simulate either M'' or M''' as appropriate until it accepts w. As in part (a), it will reset and re-simulate M as necessary to provide bits of w for M'' or M'''.
  I originally asked you to prove that NLB = NL if B is in NL, but I am not sure that this is even true. I can't complete an argument similar to the one above is an NDTM, because I can't be sure that M is consistently describing a single string w to M'' or M''' -- it could say that w_i is 0 and later say that the same bit w_i is 1, by making different nondeterministic choices.
- (c,5) Argue that NP ⊆ L^SAT, quoting results from lecture as necessary.
  We know that if X is any language in NP, there is a logspace function f such that for any string w, w is in X iff f(w) is in SAT. So a logspace machine with oracle for SAT can decide whether w is in X by computing f(w), using only O(log n) space, and feeding it to the SAT oracle. The answer to the oracle query will be 1 if w is in X and 0 otherwise.
- (d,5) It is probably not true that if B is in NP, then P^NP ⊆ NP . Why doesn't an argument like that of part (b) show this to be true?
  In the case of B being in NL, we had an NL machine whose language was B and another NL machine whose language was the complement of B. Thus we could verify either w's membership or non-membership in B with the simulating NL machine. But as far as we know, an NP machine is of no use in verifying the non-membership of a string in an NP language.

Last modified 10 March 2010