CMPSCI 187: Programming With Data Structures
============================================

Today's topics
--------------

-   administrivia

-   DFS
-   weighted graphs
-   BFS
-   adjacency lists

Administrivia
=============

Reminders
---------

Assignment 11 is due next Thursday at 8:30am; should be posted soon.

Errata
------

The code given in DJW to balance a binary search tree works well in the
absence of duplicates. When given a tree with duplicates it does not
work so well.

Clicker question: Tree balancing
--------------------------------

``` {.java}
  public void balance() {
    T[] values = (T[]) new Comparable[size()];
    Iterator<T> iterator = inOrderIterator();
    int index = 0;
    while (iterator.hasNext()) {
      values[index] = iterator.next();
      index += 1;
    }
    root = bldTree(values, 0, index - 1);
  }

  private BSTNode<T> bldTree(T[]vals,int l,int h)
  {
    if (l == h) {
      return new BSTNode<T>(vals[l], null, null);
    } else if (l + 1 == h) {
      BSTNode<T> lNode = new BSTNode<T>(vals[l],
                                      null, null);
      return new BSTNode<T>(vals[h], lNode, null);
    } else {
      int mid = (l + h)/2;
      BSTNode<T> left = bldTree(vals,l,mid-1);
      BSTNode<T> right = bldTree(vals,mid+1,h);
      return new BSTNode<T>(vals[mid],left,right);
    }
  }
```

Graphs
======

DFS
---

Last class we discussed the "path problem": is there a path from vertex
A to vertex B in the graph?

We used depth-first-search implemented in a recursive manner to find
paths. We needed to mark vertices when we saw them, otherwise we could
get into loops.

![football-graph](football-graph.png)

Today we'll look at an iterative version of DFS:

``` {.java}
  public boolean iterativeIsPath(
      GraphInterface<V> graph, V from, V to)
  {
    Stack<V> stack = new Stack<V>();
    GraphMarker<V> marker = graph.getMarker();
    stack.push(from);
    while (! stack.isEmpty()) {
      V v = stack.pop();
      if (v.equals(to)) { return true; }
      if (! marker.isMarked(v)) {
        marker.mark(v);
        for (V neighbor : graph.getNeighbors(v)) {
          stack.push(neighbor); } }
    } ;
    return false;
  }
```

Clicker question: DFS optimality
--------------------------------

Clicker question: DFS space
---------------------------

The path itself
---------------

What if we want to know not just whether a path exists, but what the
path is?

Several options exist. If we're doing recursive DFS, we can add a
`List<V> path` variable to the parameter list of the recursive method,
and keep track of the path we've found so far during the recursion. We
need to:

-   `add` an element to the `path` each time we `pop()` from the stack,
    and
-   `remove` an element from the `path` each time the recursion returns
    false `false`, since that element wasn't on a valid path

(There's an completed example of this in A11).

In the iterative version of DFS, you can't do this, since there's no
call stack. There are a couple more options:

-   The stack could store not just objects of type `V`, but a pair (like
    the `Entry` in A10) of <V, List<V>\>; update the paths as you push
    onto the stack; or
-   In addition to marking visited vertices, you can maintain another
    list: of their predecessors. As you push each neighbor onto the
    stack, you note that its predecessor is the current node. When
    you're done and have found the goal, you reconstruct the path by
    looking up its predecessor, then it's predecessor's predecessor,
    etc., to build the path (in reverse order). Then reverse it.
    (There's an example of this in A11.)

Weighted graphs
---------------

Last lecture a question was raised about how to implement weighted
edges. Let's look at that more closely today. We modify the interface:
`addEdge` now takes a weight parameter, and `getEdgeWeight` returns it:

    public interface WeightedGraphInterface<V> {
        int getNumVertices();

        boolean isEmpty();

        void addVertex(V vertex);

        boolean hasVertex(V vertex);

        List<V> getVertices();

        Iterator<V> vertexIterator();

        boolean hasEdge(V from, V to);

        List<V> getNeighbors(V vertex);

        GraphMarker<V> getMarker();

       void addEdge(V from, V to, double weight);
       
       double getEdgeWeight(V from, V to);
    }

How do we implement it? We need to change a few things in
`UnweightedDenseGraph`:

-   we need to store edge weights
-   the constructor is different
-   we need to write the two new methods
-   `hasEdge` also needs changes

``` {.java}
public class WeightedDenseGraph<V> implements WeightedGraphInterface<V> {
    protected ArrayList<V> vertices;
    public static final double NULL_EDGE = Double.POSITIVE_INFINITY;
    private double[][] edges;

    public WeightedDenseGraph(int maxV) {
        super(maxV);
        edges = new double[maxV][maxV];
        for (double[] array : edges) {
            Arrays.fill(array, NULL_EDGE);
        }
    }

    protected boolean hasEdge(int from, int to) {
        return !Double.isInfinite(edges[from][to]);
    }

    public void addEdge(V from, V to, double w) {
        int fromIndex = getIndexOf(from);
        int toIndex = getIndexOf(to);
        edges[fromIndex][toIndex] = w;
    }

    public double getEdgeWeight(V from, V to) {
        return edges[getIndexOf(from)][getIndexOf(to)];
```

Extracting commonalities
------------------------

Actually, the above way to implement weighted (and last class's
unweighted) graphs creates a lot of redundancy between the two
approaches. We can extract the common functionality into one base
interface (`GraphInterface`) and implementation (`BaseGraph`), and
separate the specialized parts into two sub-classes and sub-interfaces:

``` {.java}
public interface GraphInterface<V>
{
  int getNumVertices();
  boolean isEmpty();
  void addVertex(V vertex);
  boolean hasVertex(V vertex);
  List<V> getVertices();
  Iterator<V> vertexIterator();
  boolean hasEdge(V from, V to);
  List<V> getNeighbors(V vertex);
  GraphMarker<V> getMarker();
}

public interface UnweightedGraphInterface<V>
  extends GraphInterface<V>
{
  void addEdge(V fromVertex, V toVertex);
}

public interface WeightedGraphInterface<V>
  extends GraphInterface<V>
{
  void addEdge(V from, V to, double weight);
  double getEdgeWeight(V from, V to);
}
```

(Separating out the code for the base class and two subclasses left as
an exercise for you, though some of it is shown in A11.)

Breadth-first search
--------------------

What if we replace the stack in DFS with a queue? We'd get breadth-first
search (BFS):

``` {.java}
public boolean breadthFirstIsPath(GraphInterface<V> graph, V from, V to) {
    Queue<V> queue = new LinkedList<V>();
    GraphMarker<V> marker = graph.getMarker();
    queue.add(from);
    while (!queue.isEmpty()) {
        V v = queue.remove();
        if (v.equals(to)) {
            return true;
        }
        if (!marker.isMarked(v)) {
            marker.mark(v);
            for (V neighbor : graph.getNeighbors(v)) {
                queue.add(neighbor);
            }
        }
    } 
    return false;
}
```

Clicker question: BFS optimality
--------------------------------

Clicker question: BFS space
---------------------------

Comparing BFS and DFS
---------------------

Both will visit all edges in the worst case.

In BFS the queue can be as large as the graph. DFS only pushes as many
elements onto the stack as are along a path. Usually this is much worse
for BFS than DFS, as at a given distance d from the initial node, there
are usually more than d nodes (on board).

BFS will always find a shortest path (if one exists), assuming all edges
are of uniform weight. But sometimes DFS can find a solution (maybe not
the optimal one) faster than BFS.

There is a version of DFS called iterative deepening DFS that has all
the advantages of BFS and DFS. In short, you DFS only on paths of length
1, then only on paths of length 2, etc., then only on paths of length 3,
etc. The memory is bounded by DFS's bounds; the path found is optimal;
the runtime is asymptotically no worse than BFS. You'll see this in
CMPSCI 311 or 383.

Uniform cost search
-------------------

What if edges are weighted? Then the "shortest" path might not be the
"least cost" path, and often the latter is what we want to find. We can
use *uniform cost search* (sometimes called *min cost search*) to find
the path of least cost. In essence, we breadth-first search, but we
dequeue not the next vertex, but the next vertex of least cost.

Rather than using a queue, we use a priority queue. The entries in the
queue are of the form:

<cost, <vertex, path-to-vertex>\>

Consider a simple example:

    S
    |\
    | A
    |/
    G

where the edges have the following weights:

-   (S,A): 1
-   (S,G): 3
-   (A,G): 1

To perform UCS from S to G, we'd first enqueue the start node in a
priority queue:

    <0, <S, [S]>>

We'd dequeue it, then enqueue its neighbors. For A, the cost is the
current cost (0) plus the cost of the edge from the current node (S) to
A (1), and the path is [S, A]. For G, the cost again 0 plus the cost of
the edge (3), and the path is [S, G]. So now our queue contains:

    <1, <A, [S,A]>>
    <3, <G, [S,G]>>

We dequeue by priority and get \<1, <A, [S,A]>\>, so now we enqueue A's
unvisited neighbor. The cost is the current cost (1) plus the cost of
the edge from the current node (A) to G (1). Our queue is now:

    <3, <G, [S,G]>>
    <2, <G, [S,A,G]>>

We dequeue \<2, <G, [S,A,G]>\>, see that it is the goal, and report the
path [S,A,G] and cost (2).

Implementing graphs with adjacency lists
----------------------------------------

The approach to representing edges we've seen so far is called an
*adjacency matrix*, and requires n\^2 values (where n is the number of
vertices in the graph). But most graphs are *sparse*, that is, most
vertices are connected to only a few (not nearly n) other vertices.

An alternative implementation is to store a *list* of neighbors for each
vertex. This is called an *adjacency list* for that vertex. In 311 most
algorithms you'll see will assume this representation (though it always
a good idea think about which you should use and why).

``` {.java}
public class WeightedSparseGraph<V>
  extends BaseGraph<V>
  implements WeightedGraphInterface<V> {
  private GraphNode[] edges;

  public WeightedSparseGraph(int maxVertices) {
    super(maxVertices);
    edges = new GraphNode[maxVertices];
  }
  
  protected boolean hasEdge(int fromI, int toI) {
    return findGraphNode(fromI, toI) != null;
  }

  protected GraphNode findGraphNode(V from, V to) {
    return findGraphNode(getIndexOf(from),
                         getIndexOf(to)); }

  protected GraphNode findGraphNode(int fromI,
                                    int toI) {
    for (GraphNode node = edges[fromI];
         node != null; node = node.getNext()) {
      if (node.getData()==toI) { return node; }
    }
    return null; }

  public void addEdge(V from, V to, double weight) {
    int fromI = getIndexOf(from);
    int toI = getIndexOf(to);
    edges[fromI] = new GraphNode(toI, weight,
                                 edges[fromI]); }

  public double getEdgeWeight(V from, V to) {
    GraphNode node = findGraphNode(from, to);
    if (node == null) {
      return Double.POSITIVE_INFINITY;
    } else {
      return node.getWeight();
    }
  }

  public List<V> getNeighbors(V from) {
    int fromI = getIndexOf(from);
    List<V> neighbors = new ArrayList<V>();
    for (GraphNode v = edges[fromI];
         v != null; v = v.getNext()) {
      neighbors.add(vertices.get(v.getData()));
    }
    return neighbors;
  }
 }
```