Questions are in black, solutions in blue.
The following algorithm is an approximation algorithm for MIN-VERTEX-COVER:
Input: undirected graph G
for all nodes v in G
{mark(v) = false;}
boolean done = false;
while (!done)
done = true;
{for all edges (x,y) in G
if ((mark(x) == false) && (mark(y) == false))
{mark(x) = true;
mark(y) = true;
done = false;
break;
}
}
return set of v such that mark(v) == true;
If the algorithm terminates, the condition of the while loop must be
false, meaning that "done" was not set to false during the inner loop.
This can only happen if the for loop never encounters an edge with both
its vertices unmarked. Thus if we terminate, the marked vertices
constitute a vertex cover as every edge has at least one of its vertices
marked.
We are guaranteed to terminate because every pass through the while loop
either marks two previously unmarked vertices or gets us out of the while
loop. The first action can happen at most v/2 times (where v is the number
of vertices) because if all the vertices are marked there cannot be an edge
with both ends unmarked.
The outer loop, as I just said, can be executed at most v/2 times. The
inner loop can be executed at most e times (the number of edges) and the
statements inside the inner loop are O(1) time in all. Thus the total time
is at most O(ve), which is (n2) as both v and e are bounded above
by the length of the input in either a list or matrix representation. The
running time is thus polynomial.
Let c be the number of edges whose endpoints are marked by the algorithm (the number of times that the while loop is executed, not counting the last time). The algorithm thus marks 2c vertices. But any vertex cover must mark at least c vertices, because the c edges considered by the algorithm are vertex disjoint, meaning that no two of them share a vertex and at least one distinct vertex must be used to cover each. So the algorithm's performance ratio is at worst (2c-c)/2c = 1/2.
The simplest example is a graph with two vertices and a single edge. The algorithm will mark both vertices, but marking either vertex alone gives you a vertex cover. The algorithm thus marks twice as many vertices as there are in the optimal vertex cover.
As in the simulation of a poly-time NDTM by a poly-space DTM, run M on each
of the exponentially many possible choice sequences in turn, using a separate
tape to keep track of which sequence is currently being tried. We also keep
a counter which is initially zero and is incremented each time one of the runs
of M terminates and accepts. After all choice sequences have been tried we
output the counter, which by then has the value of countM(w).
The space required is (a) poly space to simulate the poly-time M, (b)
poly space for the tape keeping track of the current choice sequence (a bit
for each of the poly-many time steps of M), and (c) poly space for the counter
(the counter might reach a value as large as 2p(n) as there are
that many different choice sequences for a p(n)-time machine, but the log
of this is just p(n)). This is poly space in all.
We must define a game that can be played on a poly-time alternating machine,
where White will win iff t is the correct count. So we have White claiming
that the count is correct, and Black opposing this claim.
At every point the ATM will store a configuration of M, initially the
start configuration. (We'll assume that M has a clock so that the
configuration includes the number of steps since the start and a configuration
may never be re-visited by M on a run.) At every point White will be making
a claim about the number of accepting runs of M that are possible from
the current configuration, and Black will oppose this claim.
A White move will be to claim a number of accepting paths starting from
each successor configuration of the current configuration. If these separate
claims do not add up to the current claim, White loses. Black's move is to
challenge one of these successor configuration claims. The challenged
successor becomes the new current configuration, and the challenged claim
the new current claim.
The game proceeds until the current configuration is a terminating one.
If it is accepting, White wins iff her claimed number is "1", and if it is
rejecting, White wins iff her claimed number is "0".
If White's initial claim is true, she has a winning strategy that consists
of always claiming the true number of accepting paths starting from each
configuration. If White's initial claim is false, it is impossible for her
to claim a set of true numbers for the successors that will add up to her
false number, so at least one of her successor claims will be false. Black
then has a winning strategy of choosing one of the false claims at every turn,
until he wins by forcing White to make a false claim about a terminating
configuration.
The game is poly-time because there are p(n) moves (one for each step of
M on w), and each move consists of White writing down O(1) numbers of at most
p(n) bits each, and Black choosing from among O(1) alternatives.
N will simulate M on w some number of times r, each time with independent
coin flips. N will accept w iff the majority of these runs of M accept it.
Since each run of M is correct with probability at least 2/3, it is very
unlikely that fewer than r/2 of the runs of M are correct.
Here is one derivation of a suitable r. The probability that the majority
is wrong is the sum for i from 0 to r/2 of (r choose i) times (2/3)i
times (1/3)r-i. This is less than (1/3)r times
(r choose r/2) times the sum for i from 0 to r/2 of 2i. This
in turn is less than (1/3)r2r2r/2 =
(0.94281)r. Pick some number c such that (0.94281)c is
less than 1/2, and then picking r=cn makes the probability less than
(1/2)n+1.
Consider all possible sets of coin flips for N on an input of length n. For each input string n, at most a (1/2)n+1 fraction of these sequences cause N to give the wrong answer. The total fraction that cause N to get the wrong answer on any w is thus at most 2n times (1/2)n+1 = 1/2. So at least half the sequences cause N to be correct on all w, and we may pick any of these to be S.
A deterministic poly-time TM may be simulated by a poly-size circuit as we saw in lecture. N, together with S, acts like a deterministic machine in that its actions are determined by the input and by S, so we can construct a poly-size circuit from N and S. But since we did not explicitly construct S (we only proved its existence in part (b), we do not necessarily have a uniform circuit.
We can decide the NP-complete language 3-SAT in poly-time if such an algorithm exists. Given a 3-CNF formula, run the given algorithm on it to solve MAX-3-SAT for it. If the algorithm produces a setting satisfying all clauses, we know that the formula was in 3-SAT. If it produces a setting that doesn't satisfy all the clauses, we know that the formula was not in 3-SAT because this setting satisfied as many clauses as was possible. So our decision procedure for 3-SAT is to check whether the algorithm's setting satisfies all the clauses, which is clearly poly-time. Since an NP-complete language is in P under this hypothesis, P=NP in this case.
To achieve a threshold of 1/2, we merely go through the variables one at
a time, choosing a setting for each that satisfies at least as many clauses
as it refutes. A setting of a single variable can refute a clause only if
its other two literals have already been set to false. Thus that same clause
could be satisfied by setting the new variable the other way. Therefore one
setting or the other of the new variable will satisfy at least as many clauses
as it refutes, and setting each variable in turn this way will satisfy at least
half the clauses. This is within 1/2 of optimal because the optimal setting
can do no better than to satisfy all the clauses.
A more sophisticated approach can deterministically achieve the 1/8 bound
proved to be possible in part (c). (See p. 302 of [P]). By examining the
original formula, we
can determine that a random setting would satisfy 7/8 of its clauses.
We can also determine how many clauses would be satisfied by a random setting
to the formula given by setting x1 to true, and by setting it to
false. We can then set x1 so as not to decrease this number.
Then we set x2 in the same way, and all the other variables in
turn. Since the expected number of satisfied clauses starts at 7/8 of the
total and can only go up given our choices, at the end of the process we must
have satisfied at least that many clauses.
Each clause is satisfied by exactly 7/8 of the possible settings. Thus the expected number of clauses satisfied by a random setting is 7/8 of the total number of clauses. There must exist a particular setting that satisfies at least this many clauses, because if all settings satisfied fewer than 7/8 of them then the average number satisfied would also be smaller than 7/8. Note that this is an existence proof only and does not tell us how to find such a setting (except that repeated trial and error is likely to find one fairly soon).
We earlier showed that (D,D') is in EQDFA iff a related machine (that decides the symmetric difference of L(D) and L(D') is in EDFA, the set of DFA's with empty languages. In Spring 2003 HW this latter language was shown to be NL-complete and thus in NL. Since EQDFA is log-space reducible to a language in NL, it is in NL itself.
By Savitch's Theorem, any language in NL is also in DSPACE(log2n). By the Space Hierarchy Theorem, this latter class is strictly contained in DSPACE(f) for any f with log2n = o(f), hence is strictly contained in PSPACE. So the set of languages log-space reducible to EQDFA is strictly contained in PSPACE. But if EQDFA were PSPACE-complete, every language in PSPACE would be log-space reducible to it.
Given two NFA's with n states each, they each have equivalent DFA's with up to 2n states each. The construction above would reduce the original EQNFA problem to an EDFA problem on a DFA with exponentially many states. We cannot write this DFA down in poly space because it is too big. But we can keep track of one of its states at a time in poly space, and compute the next state from the current state. We can thus with an NPSPACE machine simulate this DFA on a guessed input and see whether it accepts. This puts the complement of the desired language in NPSPACE. But then by Immerman-Szelepcsenyi the desired language is in NPSPACE, and by Savitch it is then in PSPACE. the DFA
Because of the potential change in input size when the NFA is converted to a DFA, the NL algorithm for EQDFA uses NPSPACE in terms of the original input. So EQNFA is not shown to be in NL by this argument and there is no contradiction with its being PSPACE-complete.
Last modified 19 August 2003