CMPSCI 601: Theory of Computation

David Mix Barrington

Spring, 2010

Friday Discussion Notes

Friday class meetings in CMPSCI 610 will be devoted to background or tangential topics, as opposed to the material from the Arora-Barak book in the Monday and Wednesday lectures. These notes describe the topics presented in each meeting. Note that attendance on Fridays is optional and material from these classes will not appear on exams or homework assignments except as it is also referred to in lecture.

22 January 2010: I talked about two ideas to unify mathematics -- the 20th-century effort to define all mathematical objects in terms of set theory, so that the consistency of each theory could be reduced to that of set theory, and the quest to find common methodologies in different mathematical theories. Toward the latter goal, we looked at some basic mathematical objects both as sets and as more complicated structures. Two sets have the same size if there is a bijection between them, but two structures are isomorphic only if there is an isomorphism between them -- a bijection that is well-behaved with respect to the relevant properties of the structure. (For example, an isomorphism of vector spaces must be a homomorphism, i.e., a linear function.)
We can characterize sets by size, also called cardinality. A finite set has a size that is a natural number, and there are many sets that have the same size as the natural numbers -- the integers, the rational numbers, the set of all finite strings over a finite alphabet, and even the set of all finite sequences of natural numbers. Cantor called this cardinality "aleph-zero". But the set of functions from the naturals to {0,1}, called 2^ω, has a larger cardinality, as shown by Cantor's diagonal argument. (We will apply the diagonal argument to computability and complexity in Chapter 3 of Arora-Barak.) The real numbers also have this cardinality, called 2^aleph-zero. There might or might not be sets that have cardinality strictly between aleph-zero and 2^aleph-zero -- the assertion that there are not is called "Cantor's continuum hypothesis" and provably cannot be proved or disproved from the usual axioms of set theory.
In probability theory, we might take a subset X of the unit interval and ask the probability that a random real number in the interval is in X. This number is called the measure of X and cannot be defined for all possible X, only for relatively simple sets. The "Cantor set" C is the set of all real numbers in the unit interval whose base-3 expansion contains only 0's and 2's. The probability that a randomly chosen real from the interval is in C is zero, but C is an uncountable set because it has a bijection with 2^ω. In fact this bijection is a homeomorphism -- it also preserves the topology of the sets -- but that is another story.
I was asked for a reference for this material and don't have one immediately to hand, though there are many textbooks on set theory and many popular surveys of mathematics cover this material. I will have a look for a particular book I can recommend.
29 January 2010: I described the Chomsky Hierarchy of complexity classes, ranging from Type 0 (Turing-acceptable or recursively enumerable languages), through Type 1 (context-sensitive) and Type 2 (context-free) to Type 3 (regular languages). Each of these classes is defined by a particular type of grammar and by a particular type of machine. Because "w can be generated by grammar G" is an inherently nondeterministic definition, it is not surprising that the machines are nondeterministic, so that we say "w can be accepted by machine M" rather than "w is accepted by machine M". In the case of Types 0 and 3, both general Turing machines and finite automata are known to accept the same class of languages in either their deterministic or nondeterministic versions. In the case of Type 2 it is known that deterministic pushdown automata cannot accept all the languages that nondeterministic ones can, and in the case of Type 1 the problem remains open.
It is a natural question whether each class of languages is closed under complement, so that {w: w is not in L} is Type i whenever L is Type i. In the case of Types 0 and 2 this closure property is known to be false, and in the case of Types 1 and 3 it is known to be true. The question for Type 1 was posed in the 1960's and not solved until the late 1980's by Immerman (yes, our Immerman) and Szelepcsenyi, in a result we will see in Chapter 4 (it is a simple consequence of Theorem 4.20).
I mentioned that a one-tape Turing machine has three capabilities that a finite automaton does not: (1) it can move either way on its tape, (2) it can rewrite the contents of the portion of its tape originally containing the input, and (3) it can access as much additional read-write memory as it wants to the right of the input. If you give a finite automaton just capability (1), it becomes a two-way finite automaton, and these accept only regular languages. Adding capability (2) gives us a linear bounded automaton, which in its nondeterministic version defines Chomsky's Type 1 languages. In the language of this course, Type 1 is the class we will call NSPACE(n).
These definitions and statements of the results are fairly easy to find -- the article Wikipedia article on the Chomsky Hierarchy gets you to most of them -- and older formal language theory books will have the proofs. This hierarchy is no longer the principal way to classify complexity, having been largely supplanted by the classes that we will study in the main course except when grammars are the main concern.
5 February 2010: I listed several possible interpretations of the semantics of nondeterministic computation. Let N be a machine that may have zero, one, or more than one possible computation on each input x, where each such computation has an output of 0 (false) or 1 (true). (The semantics are the same for unbounded-resource nondeterministic TM's, poly-time nondeterministic TM's, or the formal language models from last Friday like (nondeterministic) pushdown automata or NFA's.)
1. Formal model: We define L(N) to be the set {x: there exists a computation of N on x with output 1}. Here we cannot say what N will do on x but only what it could do, and x is in L(N) if an output of 1 is possible.
2. Deterministic model: Given the above definition, if we want to decide whether x is in L(N) using only deterministic computation, we can simulate all possible computations of N on input x. For example, we could breadth-first search the binary tree of all possible beginnings of computations, first looking at all 1-step ones, then all 2-step, and so forth. This gives us a proof that L(N) for an arbitrary NDTM is a Turing acceptable (or recursively enumerable) language, because this BFS will eventually succeed if and only x is in L(N). For a poly-time N, of course, the BFS will in general take exponential time.
  It may be that the large number of nodes on one level of this tree actually refer to a much smaller number of computations. For example, if N is an NFA with k states, after computing on input x for t steps there are at most k configurations possible, because the input head position is fixed and the only variable is the state. We can thus simulate the BFS by recording the subset of the k states that is possible after i steps, for each i, and thus (if k is constant) determine in O(n) time exactly which states are possible after reading all n letters of x and thus whether x is in L(N). We will see later that if N is a bounded-space TM, we can similarly use dynamic programming to determine whether x is in L(N) without simulating the whole tree.
3. Probabilistic model: A natural way to imagine nondeterminism is for the machine to make its choices randomly, for example choosing each possible transition with probability 1/2, independently for each step. This makes the output of N on x a random variable, and x is in L(N) if and only if the probability that N(x) = 1 is greater than zero. As we will see in Chapter 7, this is quite different from practical randomized computation, because if this probability is very small there is no practical way to tell whether it is zero or not, unless we get very lucky.
4. Parallel model: We can imagine our computation "forking" on every step -- the original process could take the first option and create a clone of itself that takes the second option. (This is similar to the "branch prediction" in real-life multicore machines.) Then x is in L(N) if and only if some process eventually outputs 1. This is not practical in general because the number of processes grows expontially in the number of steps of N being simulated. We can take advantage of dynamic programming, if a process can recognize that it is duplicating the computation of a lower-numbered process, and terminate itself. In the simulation of NFA's, we would need at most k processes and thus this method would be practical. For bounded space machines, in some cases the number of processes can be kept reasonable.
5. Game model: Given N and x, define a one-player game where White chooses an option for N at each step of computation, and White wins the game if and when N outputs 1. Clearly White has a winning strategy for this game if and only if x is in L(N). So if White has arbitrary computational resources, White will win the game if and only if x is in L(N). We sometimes describe a nondeterministic computation by saying that it guesses some string and then computes deterministically on it -- this is like saying that White chooses any string of the correct length, so that the string chosen will lead to an accepting computation if and only if x is in L(N). In Chapter 5 we will see alternation, which extends this model by adding a second player, Black, who wants the computation to not output 1, and who has control of N's choices for some of its steps. Again, we say that x is in L(N) if and only if White has a winning strategy for this game. There are even more complicated games where N's choices may be random as well as being determined by one of the two players, as we will see.
The other concrete result I mentioned in this session was the simulation of two-way NFA's (or nondeterministic TM's with O(1) work tape space) by DFA's. The idea of this proof, like that for the simulation of deterministic two-way DFA's by ordinary DFA's, is that at any physical point in the input we only care what the two-way machine will do (or might do) if it crosses that point in a particular direction in a particular state. An ordinary DFA, with in general many more states than the two-way machine, can calculate all these actions or possible actions for each prefix of the input in turn, and thus decide what the two-way machine will do (or might do) on the entire input once the DFA has read it all.
19 February 2010: Today I talked about the similarity between the polynomial hierarchy of complexity classes including P and NP and the arithmetic hierarchy of classes including TA, the Turing Acceptable languages. The polynomial hierarchy is the principal topic of Chapter 5 of [AB], so we will see it much more later, but Exercise 2.33 was an introduction to it. Both these hierarchies of classes have definitions in terms of first-order logic, and these definitions help to explain the similarities between the two.
We defined NP as the class of languages A such that there is a poly-time DTM M such that for any string x, x is in A iff ∃y: M(x,y) = 1. We talked briefly about co-NP, the class of languages with complements in NP. A language A is in co-NP iff there exists a poly-time M such that x is in A iff ∀y: M(x,y) = 1. In terms of the game semantics from last Friday, an NP language is one where x is in A iff White, a computationally unbounded player who controls all choices of a poly-time NDTM and wants it to accept, can make it do so. Similarly if A is in co-NP, there is a poly-time NDTM M such that x is in A iff M accepts a when Black, a computationally unbounded player who wants M to reject, controls all M's choices.
Exercise 2.33 defined the language Σ₂-SAT in terms of a formula φ(x,y), where x and y are sequences of boolean variables, so that φ is in Σ₂-SAT iff there exists a bitvector x such that for all bitvectors y, φ(x,y) = 1. (By the way, as a student noticed, [AB] define this in terms of a CNF formula instead of an arbitrary boolean formula, which is a mistake -- under their definition Σ₂-SAT is in NP without any conditions.) We can define a two-player game to determine whether φ is in Σ₂-SAT: White names x, Black names y, then White wins iff φ(x,y) = 1. White wins this game, given optimal play by both White and Black, iff φ is in the language.
The reason for the name "Σ₂-SAT" is that the first-order formula ∃x:∀y:φ(x,y) is in "Σ₂ form". It has two quantifiers, the first existential and the second universal. A formula is Σ_i if there are i blocks of quantifiers beginning with an existential (the Σ is for "sum" in boolean algebra, or "or") and is Π_i if there are i blocks beginning with a universal (the Π is for "product" in boolean algebra, or "and"). A "block" is a sequence of quantifiers of the same type.
So the class NP is also called &Sigma₁^p, and there is a whole hierarchy of classes Σ_i^p and Π_i^p, called the polynomial hierarchy. The name Δ_i^p is also used for Σ_i^p ∩ Π_i^u. Note that P is contained in Δ₁^p but is not known to be equal to it -- you have proved that PRIMES is in NP (Exercise 2.5) and in co-NP (obvious) but this language was only recently proved to be in P itself.
So now to the arithmetic hierarchy. Using Sipser's terminology, a language L is TD if there is an always-halting TM that decides it, and TA if there is any TM M, perhaps not always halting, such that M(x) = 1 iff x is in L. We can rewrite the latter definition in Σ₁ form: x is in L iff ∃c: c is a computation proving M(x) = 1, or ∃c: CheckM (x, c). The language CheckM is not only TD but actually in the class P, because if we are given x and c as input it is linear time in the length of c to determine whether c is a valid computation of M with input x and output 1.
The class TA is thus called Σ₁⁰, and there are similar classes Σ_i⁰ and Π_i⁰ for every i, forming the arithmetic hierarchy. Unlike the polynomial hierarchy, the arithetic hierarchy is known not to collapse -- all of its classes are known to be distinct. In Chapter 1 we proved that the language D, which you can show to be in Π₁⁰ or co-TA, is not in TA.
Also unlike the polynomial hierarchy, the arithmetic hierarchy has a bottom level that we fully understand. The class Δ₁⁰ is the set of languages that are both TA and co-TA, and we saw in lecture that this class is exactly the set of TD languages.
In Exercise 2.14 [AB] define Cook reductions in terms of oracle machines, which have an extra query tape and the power to determine in one step whether the string on the query tape is in some language L. If we had an oracle for the undecidable language HALT, for example, we could decide the language D. But consider the language HALT^HALT, the set {(M,x): M is a TM with oracle for HALT and M(x) = 1}. This turns out to be a Σ₂⁰ language, and it is not decidable even if you are given an oracle for HALT. The proof is by diagonalization: just as HALT is not TD, HALT^HALT is not TD^HALT. In this way it can be shown that the classes in the arithmetic hierarchy are all distinct.
On Monday we will talk a bit about oracles for decidable languages that allow us to decide relativized versions of questions like whether P = NP. This is [AB]'s Section 3.4.
5 March 2010: Review for Midterm based on Practice Midterm
12 March 2010: Post-Mortem Review of Midterm
26 March 2010: Branching Programs
I talked about the branching program model, defined on page 300 of [AB]. Branching programs are another combinatorial model of computation, where there is a separate structure for each input size, deciding whether inputs of that size are in the language. A braching program is a directed graph of out-degree either two or zero. Each internal node is labeled by an input variable, and has a 0-edge and a 1-edge leaving it, telling where control goes from that node in the case where that variable is 0 or 1 respectively. Leaf nodes are labeled "accept" or "reject".
I proved two theorems similar to Chapter 6's main results about circuits. A language has a branching program family of polynomial size iff it is in the class L/poly, and it has an L-uniform branching program family of polynomial size iff it is in L. The proofs are very simple -- in each case where we start with the branching program, we can follow the path of control through the program if we have O(log n) space available to remember where we are. And if we have a logspace machine, its configuration graph has polynomially many nodes and has the edge structure for a branching program. In any configuration of the TM, there is an input variable that the machine is seeing, and there are two successor configurations, one if that variable is 0 and the other if it is 1.
To have concrete examples of both circuit and branching program computation, I considered the language PARITY, the set of binary strings with an odd number of ones. This is a regular language, decidable by a two-state DFA. To decide it on inputs of length n by a circuit, we can first imagine a binary tree of XOR gates, of depth O(log n), with the n inputs at the leaves. Since the XOR or "addition mod 2" operator is associative, the output node of this tree emits a 1 iff the input is in PARITY. To make a proper circuit of AND, OR, and NOT gates, we replace every binary XOR gate with a size-5 circuit of these gates that you can easily construct yourself.
This circuit has size O(n) and depth O(log n), so we have shown that PARITY is in the class NC^1. A similar argument can show that every regular language is in NC^1. What if we want to make our circuit depth 2, with unbounded fan-in AND's and OR's? This means constructing the CNF or DNF circuit for the n-way XOR function, and it is not hard to see that each of these circuits has 2^n-1 gates at the level next to the inputs. So PARITY is "easy" for general circuits but "hard" for depth-2 circuits.
What about branching programs? We can make a program of size about 2n, in n levels of two nodes each. Nodes on level i access input variable x_i and their edges go to level i+1. We call the two nodes on each level 0 and 1, and the 0-edges from 0 and 1 go to 0 and 1 respectively on the next level. The 1-edges from 0 and 1 go to 1 and 0 on the next level respectively. On level n+1 we have the accept node (1) and the reject node (0). This branching program is in effect the configuration graph of the Turing machine (with two states and no worktape) that simulates the two-state DFA for this language. The width of a branching program is the size of the largest level if it is divided into levels like this, with edges from level i only going to level i+1. An argument similar to this shows that any regular language has a family of branching programs of constant width and polynomial (actually linear size). As proven in my thesis and as we may see later, the class of languages with branching programs of polynomial size and constant width is exactly NC¹.
2 April 2010: Randomness and Reachability
We've just defined the randomized complexity classes based on polynomial time, such as PP, BPP, RP, co-RP, and ZPP. Each of these classes has an analog based on log space, which we may call PL, BPL, RL, co-RL, and ZPL. As always with log-space classes, our analysis uses configuration graphs, which have polynomially many nodes. We know that the PATH problem on general directed graphs is complete for NL. Early researchers looked at the UPATH problem, on undirected graphs, and defined the class SymL as the closure of this problem under log-space reductions. Lewis and Papadimitriou proved that SymL is also the class of languages of nondeterministic logspace TM's that are symmetric, in that their legal moves always include reversing the preceding move. In fact we now know SymL to be equal to L -- this is proved in Chapter 21 of [AB] and we will say something about it here.
It was also noted long ago that SymL is contained in the class RL, the subset of NL where the probability of an accepting path, for inputs in the language, is at least 1/2. This is a consequence of the analysis of random walks on undirected graphs. If we define a random process (a Markov chain) where at each node of an undirected graph we move to one of its neighbor nodes with equal probability, this process will approach a unique steady-state probability distribution from any initial distribution unless the Markov chain is reducible (the graph is not connected) or periodic (for large values of k, we can have paths from some node s to some node t only if k satisfies some congruence, such as being even). If we adjust the Markov process to include a chance of staying at the same node, the periodicity problem disappears. If we continue on a random walk for long enough, we can ensure that the probability that we have reached all reachable nodes is at least 1/2.
In the simple example where the undirected graph has degree 2, and so is made up of connected components that are either lines or cycles, we know that after O(n²) moves we are likely to have gone at least n in either direction from our start node, and thus reached all reachable nodes.
We showed that the PATH problem for general graphs can be reduced to that for 3-regular graphs, i.e., graphs where each node has degree exactly 3. We replace each node of degree 0 by a complete graph with four nodes, each node of degree 1 by a particular five-node graph, each node of degree 2 by a four-node graph, and each node of degree d ≥ 4 by a cycle of d nodes, one former edge coming to each node in the cycle. We pick one node in each of these new structures to represent the old node -- the new graph has the same PATH relation among these representatives as does the old graph among the old nodes. In a connected 3-regular graph, the steady-state distribution is the uniform distribution on the nodes.
Why is there a steady-state distribution? Linear algebra tells us that the space of vectors with n real entries has a basis of eigenvectors for the transformation matrix of an irreducible, aperiodic Markov chain. An eigenvector v for a matrix A is one with the property that vA = λv for a real number λ called its eigenvalue. A basis for a Markov chain has one eigenvector with eigenvalue 1 and others whose eigenvalues are strictly between -1 and 1. So when any vector is repeatedly multiplied by the matrix, the component in the eigenvalue-1 direction is preserved and the other components get successively smaller. Any vector for a probability distribution has a component in the eigenvalue-1 direction that is the same steady-state distribution.
An ε-expander is a graph where any set S of k nodes, with k ≤ n/2, has at least (1+ε)k neighbors (nodes in S or next to nodes in S). As shown in Chapter 21, we can also talk about λ-algebraic expanders, the eigenvalues of whose Markov chain, except for 1, have absolute value at most λ. Any graph with n nodes is a (1-&Omega(1/n²))-algebraic expander, and it is a λ-algebraic expander for a constant λ < 1 iff it is an ε-expander for some ε > 0. Reingold in 2004 proved that any constant-degree graph can be modified in log-space to get a constant-degree expander with the same connectivity properties. On the new graph, any node reachable from any node s must have a path of length O(log n), and so we can test reachability in deterministic log space by checking all paths of that length. This shows SymL = L.
On Exercise 7.10 of HW#6 you will show that the random walk technique does not work on general directed graphs, so that it does not suffice to put NL inside RL. It is considered quite conceivable that some technique like Reingold's might collapse RL or BPL to L, but it is much less conceivable that L and NL are equal.
9 April 2010: After giving some clarifications on the homework, I looked at two extensions of the interactive proof model, multiple provers and probabilistically checkable proofs. I followed [AB] fairly closely -- they discuss multiple provers and the class MIP in section 8.5, where they assert that MIP = NEXP. The main use of multiple provers, who each have arbitrary computational power but cannot communicate with each other, is that they can be forced to commit to a single exponential size proof even though the verifier cannot look at more than a polynomial amount of it. (The verifier makes sure that the two provers agree on the random bits of the proof she checks, and rejects if they don't, so they have to agree on the same proof.)
A poly-time random verifier can use polynomially many random bits and look at polynomially many bits of the input, so that the class MIP is also called PCP(poly, poly) for "probabilistically checkable proof". Chapter 11 of [AB] gives an introduction to the PCP Theorem, due to Arora and various others, which says that PCP(log n, 1) = NP. If a verifier can only use O(log n) random bits to pick O(1) bits of the proof to check, we are clearly within NP because the relevant portion of the proof is only poly length and a deterministic poly-time verifier could determine whether this limited randomized verifier would accept the proof. The surprising part of the PCP theorem is that proof systems with such limited verifiers are complete for NP. Given any NP language, there is a proof system where the verifier selects O(1) bits of the proof at random to examine -- if the input is in the language there is a completely convincing proof, and any proof of a false membership statement will convince the verifier only half the time.
Following Chapter 11, I presented the relationship between the PCP Theorem and approximation of NP-hard optimization problems, such as the problem MAX3SAT where the input is a 3CNF formula and the output is the maximum number of clauses that can be satisfied by any assignment. (All m clauses can be satisfied iff the input is in the language 3SAT, so it is clearly NP-hard to determine the number.) If we construct a 3CNF formula from an arbitrary NP membership question using the Cook-Levin Theorem, we can easily imagine an unsatisfiable formula where all but one of the clauses can be satisfied, corresponding to an alleged accepting computation history with a single flaw. But the PCP Theorem is equivalent to the statement that from the same arbitrary NP membership problem, we can construct a 3CNF formula such that either it is satisfiable (if the input is in the language) or no more than a 1 - ε fraction of the clauses can be satisfied. (Note an easy part of the equivalence -- given such a 3CNF formula, a satisfying assignment would constitute a probabilistically checkable proof because if we checked enough random clauses of an assignment to an unsatisfiable formula, we would be likely to find a clause that was not satisfied.)
The proof of the PCP Theorem is rather complicated and is presented in Chapter 22 of [AB]. When proved in the mid-1990's, it solved many of the outstanding open problems about whether (assuming P ≠ NP) various NP-hard optimization problems can be approximated in polynomial time. For these problems, including MAX3SAT and MAX-CLIQUE, there is a constant ε such that (again, if P ≠ NP), no poly-time algorithm can guarantee getting within a 1 + ε multiplicative factor of the correct answer.

Last modified 11 April 2010