# Homework Assignment #3

#### (Due for off-campus by 3:00 pm EDT Tuesday 15 November 2005)

Edits in orange made 28 October 2005.

Edits in green made 30 October 2005. (Minor clarifications only.)

Edit in purple made 31 October 2005 (also minor).

Edits in pink made 4 November 2005 (more significant).

There are five questions for 100 total points. Most are based on lectures 12-15, and thus on Chapter 6 of the Adler notes. Question 1 is based on lectures 8-11 (Chapter 5). Many of these problems, like much of Adler's Chapter 6, are taken from Randomized Algorithms by Motwani and Raghavan.

Students are responsible for understanding and following the academic honesty policies indicated on the course main page.

• Problem 3.1 (20): Kleene's Theorem says that a language A ⊆ Σ* is the language of an NFA or DFA iff it is denoted by a regular expression. When I present this in CMPSCI 250, I normally prove that NFA's can be simulated by regular expressions using the state elimination method, which is better for hand calculation. Here you'll describe and analyze another method, using dynamic programming.

Let N be an NFA with n states, called {1,...,n}. Given any states s and t and any number i with 0≤i≤n, we define the language L(s,t,i) to be the set of strings that could be read by N, starting in state s, ending in state t, and using no intermediate state numbered greater than i (recall the definition of intermediate state from the Floyd-Warshall algorithm).

Show that using dynamic programming, we can calculate regular expressions for each language L(s,t,i) using a number of regular-expression operations that is polynomial in n. State and justify a bound on how long the regular expressions might be, in terms of n. (Originally said "time polynomial in n".)

(A regular expression is a letter or ∅, the union or concatenation of two regular expressions, or the star of a regular expression. The letters and ∅ are defined to have length 1. If R has length r and S has length s, then R+S and RS are each defined to have length r+s+1, and R* is defined to have length r+1.)

• Problem 3.2 (30): Consider a full ternary tree of height h, whose 3h leaves are each labeled with 0 or 1. We want to evaluate this tree in the following way -- each internal node is to be given a label that is the majority element of the labels of its three children.

• (a,15) Prove that given any determistic algorithm that correctly evaluates the root node of the tree, there exists an assignment of 0's and 1's to the leaves that requires the algorithm to look at all n = 3h leaves in the worst case. (Hint: Describe how, as an adversary of the algorithm, you could give consistent answers to the algorithm's queries about the leaves that leave the outcome in doubt until the last query.)
• (b,15) Here is a randomized algorithm for evaluating the root node of the tree. Given any internal node, pick two of its three children uniformly at random and evaluate them recursively. Then, only if these nodes disagree, evaluate the third child. Prove that given any assignment to the leaves, the expected number of leaves queried by this algorithm is less than n0.9.
• Problem 3.3 (20): If n is any integer, φ(n) is the number of elements of the group Zn*, which is the product of the numbers pe-pe-1 for every maximal prime-power factor pe of n. If n is prime, φ(n) = n-1, and if n = pq with p and q prime, φ(n) = (p-1)(q-1). (The definition of φ(n) originally posted was incorrect in the case where n is not the product of distinct primes.)

• (a,5) Show that if we are given both n and φ(n) and if there is any prime p such that p2 divides n (that is, if n is not the product of distinct primes), then we can find at least two nontrivial factors in deterministic polynomial (in log n) time.
• (b,5) Show that if n = pq with p and q prime, and you are given both n and φ(n), we can factor n completely in deterministic polynomial time.
• (c,10) Suppose that you are given n and φ(n), that φ(n) ≠ n-1, and that the algorithms of part (a) and (b) fail, so you know that n is the product of at least three distinct primes. Let φ(n) = 2rs with s odd. Describe a polynomial-time randomized algorithm to factor n. (Hint: Pick a random a and look at the numbers as, a2s, a4s,..., aφ(n). Show that there is a good chance that one of these numbers will help you to find a nontrivial factor of n. You will probably find it useful to quote a specific fact from Lecture #14.)
• Problem 3.4 (20): Here are two applications of Chernoff bounds -- you may use the results quoted in the Adler text without proof. In each part ε is a fixed positive real number, to be treated as a constant with respect to n.

• (a,10) Find a number c such that you can prove that at most ε2n binary strings of length n have more than (n/2) + c√n ones. Express your c in terms of ε and n. (Originally said "fewer than ε2n".)
• (b/10) Let k be any positive integer and suppose that you have a Monte Carlo algorithm that decides whether a string x of length n is in a language A, with success probability at least (1/2) + n-k. Find a polynomially-bounded function f(n), in terms of ε, n, and k, such that the probability that the majority of f(n) trials of this Monte Carlo algorithm is correct is at least 1 - ε.
• Problem 3.5 (10): (from CLRS) We have n oil wells in a square area, each with an x-coordinate and a y-coordinate. We are going to build an east-west pipeline across the region, and connect each well to the pipeline by a north-south pipe. How can we choose a y-coordinate for the east-west pipeline that will minimize the total length of the north-south pipes? Describe an algorithm to do so and analyze its running time. Argue that no other algorithm can be asymptotically faster than yours.