CMPSCI 501: Theory of Computation

Solutions to Final Exam, Spring 2017

David Mix Barrington

Exam given May 2017

Solutions posted 19 May 2017

Directions:

Answer the problems on the exam pages.
There are eight problems (some with multiple parts) for 120 total points plus 10 extra credit. Actual scale was A = 100, C = 50.
If you need extra space use the back of a page.
No books, notes, calculators, or collaboration.
The first six questions are statements -- in each case say whether the statement is true or false and give a convincing justification of your answer -- a proof, counterexample, quotation from the book or from lecture, etc. You get five points for the correct boolean answer (so there is no reason not to guess if you don't know) and up to five for the justification.

  Q1: 10 points
  Q2: 10 points
  Q3: 10 points
  Q4: 10 points
  Q5: 10 points
  Q6: 10 points
  Q7: 20+10 points
  Q8: 40 points
 Total: 120+10 points

Exam text is in black, solutions in blue.

If C is any class of computers, such as DFA's, CFG's, TM's, strange variant TM's, etc.:

A_C = {(M, w): M is a computer in C and w ∈ L(M)}
E_C = {(M): M is a computer in C and L(M) = ∅}
ALL_C = {(M): M is a computer in C and L(M) = Σ^*}
EQ_C = {(M₁, M₂): M₁, M₂ are both computers in C and L(M₁) = L(M₂)}

A language is Turing decidable (TD) if it is equal to L(M) for some Turing machine M that halts on every input.

It is Turing recognizable (TR) if it is equal to L(M) for any Turing machine M.

A function f from strings to strings is Turing computable if there exists a Turing machine M such that for any string w, M when started on w halts with f(w) on its tape.

Recall that if A and B are two languages, A is mapping reducible to B, written A ≤_m B, if there exists a Turing computable function f: Σ^* → Σ^* such that for any string w, w ∈ A ↔ f(w) ∈ B. If such an f exists that is computable in polynomial time, we say that A is poly-time reducible to B, written A ≤_p B. If f is computable in log space, we say that A is log-space reducible to B, written A ≤_L B.

A homomorphism from Σ^* to Σ^* is a function f that obeys the rule f(xy) = f(x)f(y) for any strings x and y. It is determined by the strings f(a) for each letter of Σ and f(ε) must be equal to ε.

A boolean matrix is one whose entries are each 0 or 1, and where we define "addition" and "multiplication" as the boolean operators OR and AND, respectively. If A and B are each n × n boolean matrices, the matrix product AB is the matrix C such that for each i and j, the boolean value C_ij is the OR, over all k from 1 to n, of A_ik ∧ B_kj.

Given this definition of matrix product, we define the language BMM (for "boolean matrix multiplication") to be {(A, B, C): AB = C} and the language IBMM (for "iterated boolean matrix multiplication") to be {(A₁,..., A_n, C): A₁A₂...A_n = C}. The variables A, B, A_i, and C each range over n × n boolean matrices. That is, all the matrices in a product must be square matrices of the smae size, and the number of matrices in an iterated product must equal the size of the matrices.

The following language is proved to be NP-complete in the text or in Exercises, and you may assume without proof that it is NP-complete. (There are lots of other languages proved NP-complete in the course -- this is the one you will need on this exam.

3-COLOR = {G: G is an undirected graph and the vertices of G may each be assigned one of three colors so that no edge connects two vertices of the same color}

The language X over the alphabet {0, 1, #} is the set of strings in which every pair of #'s has at least one 1 between them.

The language Y over the alphabet {0, 1, #} is the set of strings w for which there exists a string v in {0, 1}^* and strings x, y, and z in {0, 1, #}^* such that w = x#v#y#v#z#.

The language Z is the set of encodings of undirected graphs G such that if the nodes of G are partitioned into any five sets, then there is at least one edge of G that has both its endpoints in the same set.

Given a directed acyclic graph G and a goal node g, we define three versions of a Race Game on G. A position in the Race Game is a tuple (p, x, y) where p is a player (White or Black), x is a node where White's token is, and y is a node where Black's token is. A move in the game from posiiton (p, x, y) is for player p to move their token from x or y along an edge of G. A player wins (and ends the game) by moving their token to g. If a player cannot move, they lose their turn but the game continues as long as the other player can move. If neither player can move, the game ends in a draw. Note that since the graph is acyclic, the game cannot continue forever.

In Version 1 of the game, there are no restrictions on the players' moves beyond the definition above. In Version 2, a move is prohibited if it would place the two tokens on the same node. In y Version 3, a move is prohibited if it would take one player's token to a node that was ever occupied by the other player's token.

Question 1 (10): True or false with justification: Assuming that P ≠ NP, the language EQ_DFA is in P.
TRUE. The assumption is irrelevant, as there is a poly-time algorithm that inputs two DFA's and decides whether they are equivalent. We first minimize both DFA's, using the algorithm presented in lecture which is clearly poly-time, since one iteration makes a single pass over the set of states, and there can be at most n-1 iterations before the division of states into classes remains the same.
But there is still the problem of taking two minimal DFA's and determining whether they have the same language. To do this we try to construct an isomorphism between the state sets of the DFA's. If we succeed, of course the languages are the same, and if we fail, we will have found proof that they are not. We begin by mapping the start state of the first DFA to the start state of the second, rejecting if one is final and the other non-final. We then, for each letter a, map the end of the a-arrow from one start state to the end of the a-arrow of the other. We reject if this map fails to be one-to-one (since then there are two strings equivalent for one language and not for the other) or if we even map a final state to a non-final state or vice versa. We continue with each mapped state, mapping the endpoint of each arrow to the endpoint of the matching arrow in the other DFA. If we complete a bijection of the states that maps each arrow to a matching arrow, we have our isomorphism, and other wise we reject.
Question 2 (10): True or false with justification: The language EQ_CFG is Turing recognizable.
FALSE. This language was proved in lecture and in the text to not be TD, by reduction from ALL_CFG. But EQ_CFG is easily seen to be co-TR, as two grammars G and H are in the complement of EQ_CFG if and only if there exists a string w that is in L(G) or L(H) but not both, and the language E_CFG is TD. If EQ_CFG were TR, therefore, it would also be TD, and it isn't.
Question 3 (10): True or false with justification: The Myhill-Nerode equivalence relation of the language X (defined above) has exactly three equivalence classes.
TRUE. The minimal DFA has three states i, p, and d, with i the start state and only final state, and transitions (i, 0, i), (i, 1, i), (i, #, p), (p, 0, p), and (p, 1, i), with all other arrows going to d. This DFA goes to its death state if it ever sees two #'s without a 1 in between, and it goes to p if it sees a # and is waiting for the 1.
Question 4 (10): True or false with justification: The language Y (defined above) is context-free.
FALSE. The key point is that the two copies of v in the string must be identical, making this language similar to {ww: w ∈ Σ^*} which we know is not a CFL. The proof uses the CFL pumping lemma. Assume Y is a CFL, let p be Y's alleged pumping constant, and let w = #0^p1^p##0p1^p# which is in Y. If this w is broken into five strings as specified in the CFLPL, we can show that the conclusion of the CFLPL is false. If either the second or fourth strings contains a #, pumping down leaves Y because every string in y has at least four #'s. Otherwise the second and fourth strings may intersect at most two of the four groups of p consecutive letters, and pumping down will destroy the property that the first group matches the third and the second matches the fourth.
Question 5 (10): True or false with justification: Assuming that NP and co-NP are not equal, the language Z (defined above) is NP-complete.
FALSE. We prove that Z is co-NP-complete, which would force NP = co-NP if it were also NP-complete. (Proving that Z is in co-NP is not sufficient for this conclusion.) Z is the complement of the language 5-COLOR, of graphs that can have their nodes divided into five sets such that no edge has two endpoints in the same set. (Many of you misread the definition of Z and argued that it was in NP, claiming that a division into five sets would be a certificate. But it isn't, since the definition says that every division into five sets has a certain property, not just one.)
5-COLOR itself is clearly in NP. with the 5-coloring being the certificate. It is easily proved to be NP-complete by reduction from 3-COLOR. Given any undirected graph G, we must construct a graph H that is 5-colorable if and only if G is 3-colorable. One way to do this is to have H consist of G and two new nodes, with edges from each new node to each old node and between the two new nodes. In any coloring of H, the two new nodes must have two colors and the old nodes must have colors chosen from the other three, so the induced coloring of G is a valid 3-coloring. And clearly any 3-coloring of G may be extended to a 5-coloring of H.
Question 6 (10): True or false with justification: Let Σ = {a, b}. Let f be any homomorphism from Σ^* to Σ^*. (Homomorphisms are defined above.) Then for any language R, R is a regular language if and only if {f(w): w ∈ R} is a regular language.
FALSE. This was a trick question of sorts, because it appears to be asking whether "the regular languages are closed under homomporphism", and you proved on the homework that they are. But the given statement also says that the non-regular languages are closed under homomprphism, because if R is not regular, f(R) must not be regular to satisfy the "if and only if".
And in fact it is easy to map a non-regular language to a regular one by a homomorphism. The simplest way is to have f(a) = f(b) = ε, so that f(R) is the regular language {ε} for any non-empty R, regular or not. Another example is to let R be the standard non-regular language {aⁿbⁿ: n ≥ 0}, and let f(a) = f(b) = b, so that f(R) is the regular language (aa)^*.
Question 7 (20+10): The definition of boolean matrix multiplication and of the languages BMM and IBMM are given above.
- (a, 10) Prove that the language BMM is in the circuit class AC⁰. Don't worry about uniformity.
  For every i and j, let D_ij be the OR, over all k, of A_ik AND B_kj. We can calculate each D_ij with an OR gate that receives the output of n AND gates. Then, for each i and j, we determine whether D_ij and C_ij are equal with an AND or two OR's or an OR of two AND's. We AND together the result of these n² comparisons, and that is our output. The depth of this circuit is 5 (AND, two for the comparison, OR, AND) and its size is O(n³).
- (b, 10) Prove that the language IBMM is in the circuit class AC¹. Don't worry about uniformity.
  For each pair of input matrices A_2i-1 and A_2i, we compute their product in depth O(1) and polynomial size as in part (a). We then pair up these products and compute n/4 matrices, each the product of four original matrices. We pair those up, and so on, forming a balanced binary tree of product operations. The tree has O(n) subcircuits each of O(1) depth and O(n³ size, and the whole circuit thus has depth O(log n) and size O(n⁴). We compare the final result with C, using an additional depth 2 and size O(n²).
- (c, 10XC) Prove that the language IBMM is complete for the class NL under ≤_L reductions.
  We first need to show that IMBB is in the class NL. My preferred way to do this is with an alternation game, using O(log n) space and O(1) alternations, appealing to the result on HW#6 that we can find the winner of such a game in NL. We will design the game so that White has a winning strategy if and only if the product of the A_i's is equal to C. Black moves first, and names nodes s and t such that (he claims) the entry C_st is not equal to the s-t entry of the product. We then construct a graph with n+1 columms S₀,..., S_n of n nodes each. We place an edge from node x of S_i-1 to node x of S_i if and only if either x = y or the x-y entry of A_i is 1. There is a path from the s node of S₀ to the t entry of S_n if and only if the s-t entry of the product is 1. So now if C_st is 0, Black can win by exhibiting such a path, and if C_st is 1, White can win by exhibiting such a path.
  If you don't like the alternation game, it should still be clear that a single NL machine can go through the entries of C in turn, proving each 1 entry correct by exhibiting a path in this graph and proving each 0 entry correct by executing the Immerman-Szelepcsenyi NL procedure for the complement of PATH.
  To show completeness, We reduce the known NP-complete problem PATH to IBMM. Given a directed graph G of n nodes, and nodes s and t of G, we need to construct an instance of IBMM that is in the language if and only if there is a path from s to t in G. Let G' be the directed graph obtained from G by adding a loop at each vertext that has no loop already. Let A' be the adjacency matrix of G'. It is well-known that there is a path from s to t in G if and only if the s-t entry of the matrix A'_n-1, defined using boolean matrix multiplication, is 1. We are not quite done, because we need to construct an instance where the product is exactly C if and only if the path exists. Let D be a matrix whose only 1 entry is D_ss, and let E be a matrix whose only 1 entry is E_tt. Then the product DA'A'...A'E, where there are exactly n-1 copies of A', is equal to C, whose only 1 entry is C_st, if and only if the path exists. Because we have n+1 matrices in our product, we must make all the matrices n+1 by n+1 to have a proper instance of IBMM. We do this by adding a single isolated node to G to give it n+1 vertices without affecting the path question.
Question 8 (40): These questions involve the three versions of the Race Game defined above.
- (a, 10) In Version 1 of the game, there are no restrictions on the two tokens occupying the same node. Prove that the set of positions in Version 1 of the game, for which White has a winning strategy, is in the class NL.
  Given any directed graph G, nodes x and y, and number k, it is clear that the predicate PATH(G, x, y, k), stating that there is a path of at most k edges from x to y in G, is in NL. By Immerman-Szelepcenyi, its complement is also in NL. White wins Version 1 of the game from position (w, x, y) if and only if there is a number k such that PATH(G, x, g, k) and NOT PATH(G, y, g, k-1). From position (b, x, y), White wins if and only there is a path from x of length k and no path from y of length k. This is clearly in NL.
  Many of you misread the definition of the game somehow to ignore Black, saying that White wins the game if and only if a path from x to g exists. But White does not win if Black reaches the goal node first.
- (b, 10) Prove that the set of positions of Version 1 of the game, for which White has a winning strategy, is complete for NL under ≤_L reductions. (You may assume the result of part (a).)
  We need only reduce the known NL-complete language PATH to Version 1. Given G, s, and t, we need to set up a Version 1 game that White indeed wins if and only if the path exists. We can do this most easily by adding a new isolated node to G and making it y, so that Black can never move and White either wins if she can or else draws. There is a complication, though, in that the game requires G to be a directed acyclic graph and the PATH problem does not. It is easy to adapt the NL-completeness proof for PATH to make the graph acyclic, by placing a clock on the Turing machine's tape so that no configuration can be repeated. Alteratively, we can reduce PATH to PATH_DAG by mapping (G, s, t) to (G', s, t) where G' is made from n+1 copies of G as in the solution to Question 7 (c) above. The reduction is pretty clearly in deterministic log space.
- (c, 10) In Version 2 of the game, the two tokens may not occupy the same position. Prove that the set of White-winning positions of Version 2 of the game is in the class P.
  So it turns out that all three versions of the game are in NL, which explains why I could not get my P-completeness proof for Version 2, or my PSPACE-completeness proof for Version 3, to work. Proving this of course suffices to answer parts (c) and (d). The fact is that the PATH conditions laid out in my solution to part (a) are still necessary and sufficient for White to win in the other versions. Suppose White has a path of length k, and Black has no better path. White can win in k moves by simply traveling along her path. Black cannot win earlier since he has no shorter path, and Black cannot prevent White from taking her path because if he moved to a node on White's path before White got there, he could then proceed to g and demonstrate the existence of a path shorter than White's.
  I think, but haven't verified, that the matching completeness results are true if White and Black have separate goal nodes.
  The proof I had in mind, that Version 2 is in P, is still valid. The simplest thing is to define an alternation game that can be played in O(log n) space and appeal to the result that AL is contained in P. To play the game, we need remember only the player whose move it is, the name of node g, and the current locations of the two players. We could also recapitulate this part of the Alternation Theorem proof by defining a game graph, with nodes representing positions in the game, and marking each position as White-winning or not.
- (d, 10) In Version 3 of the game, no player may move their token to a node previously occupied by the other player. Prove that the set of White-winning positions of Version 3 of the game is in the class PSPACE.
  As I said in part (c), determining the winner in Version 3 is also in NL. But we can still prove it to be in PSPACE without noticing that the restriction has no effect. The game can be played in polynomial time, because there can be at most n-1 moves by each player until one player has won or both have reached sink nodes. Thus, since AP is contained in PSPACE, we can find the winner in PSPACE. To recapitulate the proof of that part of the Alternation Theorem, We can evaluate the game tree by a recursive algorithm, where now a position includes the set of nodes that have been visited by each player. The recursion requires an activation record at each step that is of polynomial size, and since there are only polynomially many records on the stack at one time, the entire algorithm uses only polynomial space.

Last modified 19 May 2017