Solutions to Practice Midterm for CMPSCI 611, Fall 2005

Solutions to Practice Midterm Exam

24 October 2005

Questions in black, solutions in blue.

```  Q1: 20 points
Q2: 25 points
Q3: 15 points
Q4: 20 points
Q5: 20 points
```

• Question 1 (20): Let G be an undirected graph with edge set E. For each of the following families I of subsets of E, either prove that (E,I) is a matroid or give an example of a G such that (E,I) is not a matroid.

• (a,10) I = {X: X is a set of edges that contains no triangle} (A triangle is a set of three edges {(u,v), (u,w), (v,w)} where u, v, and w are distinct vertices.)

This is not a matroid in general, though it is a subset system. The easiest way to show this is to give an example where there are two maximal independent sets of different sizes, so that neither the Exchange Property or the Cardinality Property hold.

Let G be a graph with four vertices {a,b,c,d} and five edges, all but (a,d). The set of edges {(a,b), (b,c), (c,d)} and {(a,b), (a,c), (b,d), (c,d)} are each maximal independent because neither has a triangle but adding any edge of E to either creates a triangle. These two sets have different sizes.

• (b,10) I = {X: X contains no path from s to t} where s and t are fixed distinct vertices of G.

This is also a subset system but not a matroid. We find a counterexample to the cardinality property as in (a). We can actually use the same graph, renaming a as s and d as t. Then either of the two triangles are maximal independent sets, as is {(s,b), (c,t)}, because they do not have paths from s to t but any new edge creates such a path. There are maximal independent sets of size 2 and of size 3, so this is not a matroid.

• Question 2 (25): Let G be a directed graph with distinct specified vertices s and t. We know that if we make G into a flow network N by setting the capacity of each directed edge to 1, then there is a flow of size k in N iff there is a set of k edge-disjoint paths from s to t.

• (a,15) Describe how to build a flow network N' from G so that there is a flow of size k in N' iff there is a set of k vertex-disjoint paths from s to t in G. (Note added for solutions: Of course "vertex-disjoint" doesn't refer to sharing s and t, I should have mentioned this.)

Replace each vertex v of G, other than s and t, with two vertices v and v'. Edges of G into its v still go into the new v, but edges out of v in G now come out of v'. We make an edge of capacity 1 from v to v', and leave all other capacities the same (1).

Now suppose there is a flow of size k in N'. As in N, this can be written as the sum of k edge-disjoint paths. But no two paths could use the same vertex v (other than s or t) because then both would have to use the only edge out of v in N', to v', and it has capacity only 1.

If we have k vertex-disjoint paths from s to t in G, we make a flow in N' in the obvious way. For each v that a path uses other than s or t, we route the flow across the edge from v to v', and the capacity of 1 lets us do this.

• (b,10) Suppose that each vertex v other than s and t is assigned a non-negative integer h(v). Describe how to build a flow network N'' such that there is a flow of size k in N'' iff there is a set of k edge-disjoint paths from s to t in G, such that for each vertex v, no more than h(v) of the paths pass through v.

N'' is just like n except that the edge from v to v' in N'' has capacity h(v). Then a flow through N'' must have some flow x across the edge from v to v', and therefore uses x edges into v and x edges out of v' (since all those edges have capacity 1). When we write the flow as the sum of k paths, exactly x of these paths in G use the vertex v, and x ≤ h(v) because of the capacity constraint on the edge in N'' from v to v'.

• Question 3 (15): Suppose that G is a weighted directed graph with n vertices and e edges, and that each weight is an integer in the set {1,...,n}. Describe an unweighted graph H, containing a vertex for each vertex of G plus perhaps more vertices, such that for any vertices s and t of G, the unweighted distance from s to t in H is the same as the weighted distance from s to t in H. Determine the time needed to solve the single-source shortest path problem in G by conducting a breadth-first search in H, in terms of n and e. Compare this time with the time required to solve the single-source shortest path problem in G by Dijkstra's algorithm.

Replace each edge of capacity k in G by a path of k edges in H, introducing k-1 new vertices for the intermediate vertices in this path. H may then have as many as n + e(n-1) = O(ne) vertices and e + n(e) = O(ne) edges. The breadth-first search will take time proportional to the number of edges in H, which is O(ne). Dijkstra's algorithm took O(ne) by the most naive implementation of the priority queue, but O(e log n) if we used simple heaps. So Dijkstra is faster asymptotically.

• Question 4 (20): Let Σ be a finite alphabet and let N be a nondeterministic finite automaton with alphabet Σ. Specifically, Q is a set of n states and for every pair of states i and j, Aij is defined to be a subset of Σ -- the letters that allow N to go from state i to state j. Let A be the matrix whose (i,j) entry is Aij, thought of as a set of strings.

• (a,10) Suppose that we define "addition" of sets of strings to be the union operation, and "multiplication" to be concatenation of sets. That is, if X and Y are sets, then "XY" is the set {xy: x ∈ X and y ∈ Y}. Let k be a positive integer. Using the Path-Matrix Theorem, describe the set given by the (i,j) entry of the matrix Ak in terms of N.

Let G be the labeled graph that is the diagram of the NFA, where the edge (i,j) is labeled with the set Aij. By the Path-Matrix Theorem, this entry is the "sum", over all paths in G, of the "product" of the entries along the path. Along a given path, when we take the concatenation of the sets for each edge in order, we get a set of strings of length k where the i'th letter of each string is taken from the set for the i'th edge of the path. This is exactly the set of strings of length k that the NFA could read while taking that path. The union over all paths is the set of strings of length k that the NFA could read while moving from state i to state j. This set of strings is the value of the (i,j) entry of Ak.

• (b,10) How many bits of space might we need to carry out the computation of Ak, in terms of |Σ|, n and k?

Raising an n by n matrix to the k'th power takes O(n3log k) operations on individual entries by standard matrix multiplication. (And we have no subtraction available with which to use Strassen or other methods.) An operation on the entry involves unioning or concatenating sets of strings, which are linear-time operations in the number of strings in the result. How large could one of these sets be? The worst case would be if all strings of length k were in the result. There are |Σ|k strings of length k, so our bound on the total running time is O(n3(log k)|Σ|k). The only way this is polynomial in k is if Σ is a one-letter alphabet.

I see I've answered the wrong question here! The space is what I asked about, and what we need is the space to hold O(n2) matrix entries. By the discussion above, one matrix entry might need |Σ|k bits, if we do the simplest thing and use a bitvector to say whether each of the possible strings is in the set. This makes the total space O(n2|Σ|k), which is polynomial in n but is exponential in k unless Σ has only one letter.

• Question 5 (20): A groupoid is a set G, of n elements, and an arbitrary function from (G times G) to G. Let w = w1...wk be a sequence of k elements of G. Since the function need not be associative, we can "multiply out" w to get a single element of G in multiple ways, and need not get the same answer each time. (For example, a(bc) need not equal (ab)c.) Present and analyze an algorithm, whose running time is polynomial in n and k, to take as input both w and a multiplication table for G and output the set of possible products of w.

This is very similar to the CKY algorithm for determining whether a string is in a given context-free language. For each i and j with 1 ≤ i ≤j ≤n, and each element a of G, we define a boolean variable A(i,j,a) that is true iff it is possible to multiply the elements wi through wj to get the result a. We then evaluate each of these O(k2n) booleans by dynamic programming, computing then in increasing order of the parameter j-i.

We first calculate all the kn values A(i,i,a), which take only O(1) time each because we just need to look up whether wi is equal to a.

Now assume that we have already calculated A(i,j,a) for every i and j with j-i ≤ t, and fix a particular i and j with j-1 = t+1. For each a, the boolean A(i,j,a) is true iff there exists a number k such that:

• i ≤ k < j, and
• there exists elements b and c of G such that
• A(i,k,b) and A(k+1,j,c) are true, and
• b times c = a in G

Searching for this situation with a simple loop takes O(kn2) time to compute each entry, making our time O(k3n3) in all. Depending on the relationship between n and k, there are better ways to go about it -- for example, if n is large we might want to precompute the set of (b,c) pairs with bc=a for each a.