17: More on Graph Search

Announcements

Advising tonight after class!

Finding the path

isPath doesn’t actually find the path, it just checks to see if there is one.

One way to find the path is to change visited slightly.

Instead of keeping track only of whether or not a vertex has been visited, we can keep track of where we “came from” to get to that vertex. In other words, we can track the “predecessor” of that vertex. (on board)

Here’s the updated code:

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal) {
  Queue<V> frontier = new LinkedList<>();
  frontier.add(start);


  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }     
  return path;
}

As before, we could do the goal check inside the inner for loop to save a few frontier expansions; I broke it out here to make it more clear, but either way works.

OK, great! What does this look like generally? Again, we search each vertex one hop away before we get to any of the vertices two hops away, and so on. This behavior, the choice of which vertices to search, is entirely a function of how we store and return vertices from the frontier. When it’s a queue, we get this “breadth-first”, ripples-in-a-pond behavior. You can imagine the form of the search a tree, where each level of the tree is the distance, in hops, from the start node. We search this tree level-by-level in a breadth first search. (on board)

The other way to search a graph is “depth-first” search, where we fully explore one branch before backtracking to the next.

BFS code

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal) {
  Queue<V> frontier = new LinkedList<>();
  frontier.add(start);


  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }     
  return path;
}

Example runs

How does this work on real graphs? Let’s do a few examples.

  • Simple (three node)

  • More complex example

In-class exercise

On efficiency

BFS always finds (one of) the paths to the goal that has the shortest number of hops, since it always searches all paths of n hops before searching paths of hop n-1. But it requires that you remember the entire search tree!

DFS, interestingly, does not need to remember the entire tree; only the vertices along a path along with their neighbors. Once a vertex has been visited it can be forgotten, if you’re willing to do some minor bookkeeping (less than is required in BFS, which requires tracking every previously visited node). But DFS might not find the shortest path path. Remember that if this were a depth-first search, we’d search the tree as far as possible down one path before backtracking. (on board) To get this behavior, all we need to do is switch the queue to a stack. (Correctly “forgetting” nodes to keep space requirements low is more complex code-wise to implement than just switching to a stack, though.)

Both breadth- and depth-first search are said to be “uninformed” search. That is, the explore the frontier systematically, but with no knowledge of where the goal is relati ve to the current position. There’s only so much you can do to optimize them (there’s a hybrid algorithm called “iterative deepening DFS” that sorta gets you the best of both worlds; you might analyze this in more depth in 311 or 383).

But if you know something about the problem domain, you can do better.

For example, suppose you have a graph where the vertices represent places (say, on campus), and the edges represent paths between those places. Each edge has a cost associated with it (say, the distance), and you’re trying to find a least-cost (aka shortest) path between a start and a goal.

If you just pick edges according to BFS, you’ll find the shortest path hop-wise but not cost-wise. How might we do better?

Well, one way would be to order the frontier, from least-cost to highest-cost, and examine vertices in order from least-to-highest cost. Of course, we are probably working with estimates, since if we really knew the true cost, it wouldn’t be a search: we’d just follow the least-coast path like a homing missile.

How do estimate their costs? We say that each vertex’s cost is defined by a function f(x).

One definition for f(x) is a heuristic, say h(x). A heuristic is an “approximation” or guess of a true value. In our campus graph example, a heuristic might be the straight-line distance from the vertex in question to the goal; usually, paths are roughly straight lines, though of course buildings or the campus pond might make this heuristic not-quite-correct. We can compute this by looking at the map.

So, one approach is to do a “greedy” search, where we always choose the closest node. How can we do this? We could sort the frontier after each iteration of the loop, which would require time proportional to about (n log n), where n is the number of vertices, if we used an efficient sorting algorithm. And that’s fine. It would produce a “priority queue,” that is, a queue that returns things not in first-in, first-out order, but “best-out” order.

It turns out Java implements this for us.

Priority queues

The PriorityQueue will act exactly as we want, allowing items to be added in arbitrary order, and returning them in lowest-cost order. Priority queues internally are not a List, but instead a heap. We won’t implement heaps in this course (wait for it… you will in 187!). Heaps are “not-quite-sorted”; they maintain another property (the “heap property”) which lets them remove and return the current smallest item in (log n) time, and add new items in (log n) time.

In any event, we need to define an ordering on items to create a useful priority heap. Just like we’ve seen several times before, when there’s additional context we need to compare two items (for example, in our campus navigation example, we’d need to know about the map, not just the location), we can define a Comparator to hold this additional state and use it in its compare method. This Comparator gets passed to the PriorityQueue constructor, and then we have a “greedy” search. This is basically a one-line change to the method, just like going from BFS to DFS.

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal,
    Comparator<V> comp) {
  Queue<V> frontier = new PriorityQueue<>(comp);
  frontier.add(start);

  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }
  return path;
}

Defining the comparator is problem-specific; we can assume it’s been passed in as above.

Greedy search can “get it wrong” and find a sub-optimal path, especially if the heuristic is inaccurate. As you’ll learn in later courses, an optimal informed search algorithm is called “A*”, and its f(x) = g(x) + h(x). g(x) is just the least-known cost to get to vertex x so far; h(x) is a heuristic that must obey certain conditions – an “admissible” heuristic. Again, you’ll see this in future courses.

Review of Graph Representations

Recall that we can represent the Graph ADT in any implementation we want: in this course, we’ll (briefly) sketch the Adjacency Matrix and Adjacency List implementations. Today we’ll present them at a high level, and next lecture we’ll go over them in more detail.

Consider a (very) simplified graph, where V = {0, 1, 2, … n-1}.

The adjacency matrix representation just creates an n x n 2D array of booleans, representing the edge from-to relationship. A given entry in the array is true iff there exists an edge from-to the corresponding indices of the array.

The adjacency list representation is an array of lists. The array is n elements long; each element points to a list of outgoing edge destination corresponding to that element’s edges (or an empty list, if it has no outgoing edges).

(on board)

More next class!