CMPSCI H04 is a one-credit honors course attached to CMPSCI 311, Theory of Algorithms, in Fall 2003. It is open to Commonwealth College students also enrolled in 311, and to non-Commonwealth students from 311 if space permits.

The exact content of the course will develop as the semester goes on, but the basic idea is that we will explore additional topics in the theory of algorithms. Grading will be based on attendance, class participation, a few problem sets, and probably end-of-term individual presentations of some kind.

The honors section meets Fridays 11:15-12:05, immediately after
the Friday lecture meeting of 311, in room **A339 LGRC**.

**3 November 2003**
As promised long ago, here are some problems concerning the ```
Permutation
```

class defined in Discussion #6. Given
the existing class (but ignoring the "arrow" field if that's more convenient),
implement the following methods:

```
public boolean isIdentity()
{// returns true iff "this" is the identity permutation
public Permutation compose (Permutation p)
{// returns "this followed by p", throws exception if n's differ
public Permutation inverse()
{// returns p such that p.compose(this).isIdentity() is true
public Permutation power(int k)
{// returns composition of k copies of "this", k may be positive,
// zero, or negative, use repeated squaring or something faster
public int order()
{// returns least positive k such that this.power(k) is identity
// _don't_ do this by brute force, use tools below
public int apply(int i)
{// returns result of "this" applied to element i
public int elementOrder(int i)
{// returns positive least k such that this.power(k).apply(i) == i
```

**31 October 2003**
We talked more about what special cases of the path problem might be
solvable in O(log n) space. I showed a reduction from the general problem
to the problem for DAG's. That is, given any directed graph G with marked
vertices s and t, I showed how to build a DAG D with marked vertices s' and
t' such that there is a path from s to t in G iff there is a path from s' to
t' in D. Nodes of D are pairs (v,i) where v is a vertex of G and i is a number
from 0 through n-1. There is an edge in D from (v,i) to (w,i+1) whenever
there is an edge from v to w in G. There are also edges from (t,i) to (t,i+1)
for every i. There are no other edges in D. The vertex s' is (s,0) and t'
is (t,n-1).

But the notion of "reduction" we discussed in lecture today requires that the composition of two "easy" functions be "easy", and this is not obvious if "easy" means "logspace". This led to our problem for the day. Define a "logspace" Java method, computing a function from strings to strings, as follows. The method has one String parameter x, which it accesses by the charAt method. It has a WriteOnlyString object w, which supports only two methods -- the constructor and an "append" method that appends a character to it. The method ends by returning w. It may not alter x. Apart from x and w, its memory consists of O(1) objects each storable in O(log n) bits.

The problem is this: given two logspace computable methods f and g, show that the composition of their functions can be implemented as a logspace method. That is, build a method h such that for all strings x, h(x) = g(f(x)), and h follows the rules for a logspace function.

We observed that a logspace function whose input is n letters long can only
run for n^{O(1)} steps, because if it runs for more steps than it has
"states" (memory contents and memory head position) it must be in an infinite
loop and doesn't compute a total function.

I see I owe you some problems on the Permutation class, I will try to get to this.

**24 October 2003**
We examined the Cayley graphs for S_{3}, S_{4}, and
S_{5} with all transpositions as generators. We noted that
the distance to the identity node in the graph depended only on the
permutations *cycle decomposition*. For next time you should
discover and prove a rule that determines this distance from the cycle
decomposition. The division of all permutations into *even* and
*odd* follows from this rule -- a permutation is even iff its
distance to the identity is an even number.

I will soon post some problems here on the Permutation class from Discussion #6, asking you to implement some new methods.

I raised the new topic of algorithms for the path problem (given
G, x, and y, is there a path from x to y in G). We've seen how to solve
this quickly using linear space, by DFS or BFS. Now we're interested in
limited space. For what special classes of graphs can we do it in O(log n)
space? Mike gave a recursive algorithm that worked for full binary trees,
or in general for trees with branching factor O(1) and depth O(log n).
We then saw that (assuming you can reverse edges) we can solve the problem
for *any* tree in O(log n) space by putting a pointer at t and then
repeated moving the pointer back along the unique edge ending where it is,
until we reach x and return true or reach a source node and return false.
Can we do a DAG in O(log n) space? What about a directed or undirected
*grid graph*? And how close to O(log n) space can we come for a general
graph?

**17 October 2003**
We began with a discussion of the pancake flipping problem. We saw
the 2n - 3 upper bound, and Mike S. showed us a lower bound of n.
(The lower bound based on the number of possible moves comes out
slightly less than n.) I reported that I'd found T(4) = 4 by
exhaustive search. (I found a
web page that reports on exhaustive searches showing T(5) = 5,
T(6) = 7, T(7) = 8, T(8) = 9, T(9) = 10, T(10) = 11, T(11) = 13, and
T(12) = 14. So this gives an upper bound of 2n - 12.) I'm interested
in any better upper and lower bounds, from the web or elsewhere.
Apparently Bill Gates worked on this problem as an undergrad.)
(Added note: follow the link from this page to the
On-Line Encyclopedia of Integer Sequences to get
references including Gates -- a great place to start a project.)

This led to talking about *Cayley graphs* for groups with
a given set of generators. If G is a group and X a generating set,
the Cayley graph has a node for each element g of G and an edge from
g to xg for every x in X. The pancake graph is the Cayley graph of
S_{n} (the group of all permutations) with generating set equal
to the n-1 different flipping operations. The pancake number is the
diameter of this graph, which is the longest distance from one point to
another. The n=3 pancake graph is a hexagon, and the n=4 graph with
24 vertices looks a bit like a soccer ball though we don't have a good
picture of it. (Vitaly reported that in Math 411 they've been looking
at the Cayley graph of S_{4} with generating set the three
transpositions (1 2), (2 3), and (3 4), which can be drawn as a three-d
figure whose sides are eight hexagons and four squares.)

Generalizing Vitaly's generating set, we noted that the diameter of this graph is (n choose 2) because bubblesort and insertion sort operate by exchanging adjacent items, and we proved that any such sort needs (n choose 2) exchanges in the worst case.

What about the generating set consisting of all (n choose 2) transpositions?
We found an upper bound of n-1 for the diameter of this Cayley graph, because
you can always sort by moving each item into place in turn, exchanging it
with a smaller item. (Once all but one item is in place the last item must
be in place as well.) We showed a lower bound of n/2, because if all items
are out of place at the start each exchange can only fix two of them. Your
**problem**, to **write up** if you get anywhere, is to improve the
lower bound to match the upper bound. A hint: it is helpful to consider
the *cycle structure* of each permutation, as I mentioned in class.
For example, the permutation that takes 1 to 3, 2 to 4, 3 to 5, 4 to 2, and
5 to 1 is written in cycle terms as "(1 5 3) (2 4)". What can multiplying
by a transposition do to the cycle structure?

**10 October 2003** (Sorry to be missing three of you!)
We showed an upper bound of n + 2*ceiling(log n) + ceiling(log log n) -
5 for finding the first, second, and third of n items. My recollection
from the literature is that this is tight, but we first have to solve
the easier problem of a lower bound to match the upper bound for finding
first and second. We discussed ideas for this, please try to solve
this problem completely and **hand in a solution** if you do.

We talked briefly about the general selection problem, finding the k'th biggest of n items. Generalizing our upper bounds for k=2 and k=3 would suggest n + (k-1)(log n) + o(log n). But at least when k becomes Ω(n), sorting becomes a competitive option.

You might conjecture from this that median (k=n/2) is as hard as
sorting but it is not. Soon in lecture we'll do quickselect, which uses
the quicksort idea to get an expected linear number of comparisons (see
Levitin). A possible end-of-term project would be to implement and/or
present the *deterministic* linear-time selection algorithm presented
in CLRS.

We concluded by defining the *pancake flipping problem*, given
by Levitin as Exercise 5.6.10 on page 190. He asks the easy question, of
putting an *upper bound* on the number of flips needed to sort any
input. I'm interested in the *lower bound*. We observed in class that
since you have n-1 possible moves, you need at least log_{n-1}(n!)
moves to have enough choices to reach all n! permutations. This gives an
Ω(n) asymptotic lower bound, but not an optimal one. If T(n) is the
number of flips needed to sort n pancakes, we saw that T(1)=0, T(2)=1,
and T(3)=3. Since log_{3}(4!) < 3, we probably need a different
argument to get matching upper and lower bounds for n=4. Work on this
problem and **hand in** any results you find. (You should at least hand
in a good upper bound!)

**3 October 2003**: We proved that finding the minimum and
maximum of n numbers requires ceiling(3n/2) - 2 comparisons. The
new problem is finding the *first and second* largest of n elements.
We have an upper bound for the problem of n + ceiling(log n) - 2
comparisons as follows:

- Run a binary tournament to find the maximum, using n-1 comparisons.
- Consider the elements that lost a comparison directly to the eventual maximum. There are at most ceiling(log n) of these.
- Find the maximum of this set, using at most ceiling(log n) - 1 comparisons. This is the second largest element.

Your assignment *to write up* is to prove that this bound cannot
be improved -- that you can design an adversary algorithm to prevent any
algorithm that uses fewer than n - ceiling(log n) - 2 comparisons from
finding the first and second largest.

If you make no progress on this problem, here's an alternate.
Give an upper bound, the best you can, for the problem of finding the
*first, second and third* largest elements. Then think about the
matching lower bound...

**26 September 2003**: I talked about lower bounds in the decision
tree model for the worst-case number of comparisons needed to sort
(log (n!)) or find the smallest element (n-1) of a set of n elements.
We calculated this bound for sorting:

B(0)=0, B(1)=0, B(2)=1, B(3)=3, B(4)=5, B(5)=7, B(6)=10, B(7)=13, B(8)=16

and then looked at the number of comparisons used by Mergesort:

M(0)=0, M(1)=0, M(2)=1, M(3)=3, M(4)=5, M(5)=8, M(6)=11, M(7)=14, M(8)=17

I showed the decision tree arising from Mergesort with n=4 and began the construction of a decision tree of depth 7 with n=5.

**Assignment to write up and hand in:** Finish the proof that there
is a correct depth 7 with n=5, and prove that there is a correct tree of
depth 10 with n=6.

I then posed the question of finding the maximum and minimum element of the same set with a decision tree. Clearly from the bounds for maximum and minimum separately you get a lower bound of n-1 and an upper bound of 2n-2 for finding both. (Actually you can modify the latter to 2n-3 by first finding the maximum and then finding the minimum of the other n-1 elements.) We showed two algorithms that each achieved 3n/2 - 2 comparisons for even n and (3n-3)/2 comparisons for odd n.

**Assignment to think about:** How can you prove that these upper
bounds cannot be improved? That is, how can we prove a lower bound to match
these upper bounds?

**19 September 2003**: We discovered that the two functions:

f(n) = n! if n is odd, (n-1)! if n is even, and

g(n) = (n-1)! if n is odd, n! if n is even

are the desired pair of nondecreasing positive functions neither of which is big-O of the other.

We looked at the D and I operators on sequences of numbers. We
answered the problem from last time by showing that D(2^{n})
= 2^{n} and I(2^{n}) = 2^{n} - 1. We
found that D((n choose d)) = (n choose d-i) and that I((n choose d)) =
(n choose d+1). The way to differentiate or integrate an arbitrary
polynomial is to write it as a linear combination of binomial coefficients
and then differentiate or integrate these term by term. We also used
these operations to extend the definition of "(n choose d)", and hence
of Pascal's Triangle, beyond its normal range to allow n and/or d to be
negative.

I promised to post a take-home problem here but never did so.

**12 September 2003**: We solved the first problem in class,
discovering that f(n) = (log n)/(log log n) works. We solved the
second problem as posed, with the functions f(n) = sin n and g(n) =
cos n. We could get both functions to be positive by taking f(n) =
1 + sin n and g(n) = 1 + cos n. The next question was to get such an
f and g that were also both *increasing* functions (meaning that
if m < n then f(m) < f(n) and similarly for g). We didn't get one
in class and left this as a problem to work on over the week.

I then defined two operators whose input and output are sequences of real numbers (or functions from non-negative integers to real numbers). If f is such a sequence, "Df" is defined to be the sequence such that Df(n) = f(n+1) - f(n). "If" (not the word "if" but the operator "I" applied to the function "f") is defined to be the sequence such that If(n) is the sum for i from 0 to n-1 of f(i).

I showed in class that for any function f, DIf(n) is just f(n) and IDf(n) is f(n) - f(0). I left you with the question of finding f's such that If = f or Df = f.

**5 September 2003**: The six of you who need to add the course
should be able to do so on SPIRE Monday or Tuesday.

I gave you a definition and two problems to think about before
next class. The definition is as follows. Let f(n) be a function
from positive integers to reals. We say "f is polynomial", or
"f = n^{Θ(1)}", if there exists a number k such that
f = O(n^{k}) and there exists a positive real number ε
such that f = Ω(n^{ε}). The two problems were:

- Find a specific function f(n) such that f(n)
^{f(n)}is polynomial. We observed that a constant function is too small, and that f(n)=n is too big. What about f(n) = log n? - Can you find two functions f(n) and g(n) such that the statements "f = O(g(n))" and "g = O(f(n))" are both false?

Last modified 31 October 2003