17: More on Search

Announcements

A10 is posted and due a week from today, since you’ll need today’s lecture content to complete it.

Searching a Graph

So, recall last class we left off with a fast introduction to graph search. Let’s go over it again in a little more detail, then talk about how you might find paths within a graph search.

Let’s work through an example: (on board, graph S,1,2,3,4,G where 1,2,3 are strongly connected and 4 is only connected to 3).

The idea behind searching a graph is that we want to systematically examine it, starting at one point, looking for a path to another point. We do so by keeping track of a list of places to be explored (the “frontier”). We repeat the following steps until the frontier is empty or our goal is found:

  • Pick and remove a location from the frontier.
  • Mark the location as explored (visited) so we don’t “expand” it again.
  • “Expand” the location by looking at its neighbors. Any neighbor we haven’t seen yet (not visited, not already on the frontier) is added to the frontier.

What might this look like in code?

static <V> boolean isPath(UndirectedGraph<V> graph, V start, V goal) {
  Queue<V> frontier = new LinkedList<>();
  frontier.add(start);

  Set<V> visited = new HashSet<>();
  visited.add(start);

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    if (current.equals(goal)) return true;
    for (V next : graph.neighborsOf(current)) {
      // note: could put check for goal here instead
      if (!visited.contains(next)) {
        frontier.add(next);
        visited.add(next);
      }
    }
  }
  return false;
}

Note I used a Queue here; this first in, first-out behavior enforces a breadth-first search. (on board) Queues are lists, but you can only add on one end, and only remove from the other; like waiting in line at Disney or some such. (You could totally use a List if you wanted to, but how you add and removes vertices from the frontier controls how the search runs.)

Using a Queue, this search will visit all vertices adjacent to the start (that is, one hop away from the start) before it visits their neighbors (two hops away from the start), and so on, like ripples in a pond. This is called a “breadth-first” search.

Depending upon the order of vertices returned from our frontier, the search will progress in different ways; most notably is a depth-first when the frontier is a stack – last in, first out. You’ll see this in more detail in 187.

Finding the path

isPath doesn’t actually find the path. One way to find the path is to change visited slightly.

Instead of keeping track only of whether or not a vertex has been visited, we can keep track of where we “came from” to get to that vertex. In other words, we can track the “predecessor” of that vertex. (on board)

Here’s the updated code:

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal) {
  Queue<V> frontier = new LinkedList<>();
  frontier.add(start);


  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }     
  return path;
}

As before, we could do the goal check inside the inner for loop to save a few frontier expansions; I broke it out here to make it more clear, but either way works.

OK, great! What does this look like generally? Again, we search each vertex one hop away before we get to any of the vertices two hops away, and so on. This behavior, the choice of which vertices to search, is entirely a function of how we store and return vertices from the frontier. When it’s a queue, we get this “breadth-first”, ripples-in-a-pond behavior. You can imagine the form of the search a tree, where each level of the tree is the distance, in hops, from the start node. We search this tree level-by-level in a breadth first search. (on board)

The other way to search a graph is “depth-first” search, where we fully explore one branch before backtracking to the next.

On efficiency

BFS always finds (one of) the paths to the goal that has the shortest number of hops, since it always searches all paths of n hops before searching paths of hop n-1. But it requires that you remember the entire search tree!

DFS, interestingly, does not need to remember the entire tree; only the vertices along a path along with their neighbors. Once a vertex has been visited it can be forgotten, if you’re willing to do some minor bookkeeping (less than is required in BFS, which requires tracking every previously visited node). But DFS might not find the shortest path path. Remember that if this were a depth-first search, we’d search the tree as far as possible down one path before backtracking. (on board) To get this behavior, all we need to do is switch the queue to a stack. (Correctly “forgetting” nodes to keep space requirements low is more complex code-wise to implement than just switching to a stack, though.)

Informed search

Both breadth- and depth-first search are said to be “uninformed” search. That is, the explore the frontier systematically, but with no knowledge of where the goal is relative to the current position. There’s only so much you can do to optimize them (there’s a hybrid algorithm called “iterative deepening DFS” that sorta gets you the best of both worlds; you might analyze this in more depth in 311 or 383).

But if you know something about the problem domain, you can do better.

For example, suppose you have a graph where the vertices represent places (say, on campus), and the edges represent paths between those places. Each edge has a cost associated with it (say, the distance), and you’re trying to find a least-cost (aka shortest) path between a start and a goal.

If you just pick edges according to BFS, you’ll find the shortest path hop-wise but not cost-wise. How might we do better?

Well, one way would be to order the frontier, from least-cost to highest-cost, and examine vertices in order from least-to-highest cost. Of course, we are probably working with estimates, since if we really knew the true cost, it wouldn’t be a search: we’d just follow the least-coast path like a homing missile.

How do estimate their costs? We say that each vertex’s cost is defined by a function f(x).

One definition for f(x) is a heuristic, say h(x). A heuristic is an “approximation” or guess of a true value. In our campus graph example, a heuristic might be the straight-line distance from the vertex in question to the goal; usually, paths are roughly straight lines, though of course buildings or the campus pond might make this heuristic not-quite-correct. We can compute this by looking at the map.

So, one approach is to do a “greedy” search, where we always choose the closest node. How can we do this? We could sort the frontier after each iteration of the loop, which would require time proportional to about (n log n), where n is the number of vertices, if we used an efficient sorting algorithm. And that’s fine. It would produce a “priority queue,” that is, a queue that returns things not in first-in, first-out order, but “best-out” order.

It turns out Java implements this for us.

Priority queues

The PriorityQueue will act exactly as we want, allowing items to be added in arbitrary order, and returning them in lowest-cost order. Priority queues internally are not a List, but instead a heap. We won’t implement heaps in this course (wait for it… you will in 187!). Heaps are “not-quite-sorted”; they maintain another property (the “heap property”) which lets them remove and return the current smallest item in (log n) time, and add new items in (log n) time.

In any event, we need to define an ordering on items to create a useful priority heap. Just like we’ve seen several times before, when there’s additional context we need to compare two items (for example, in our campus navigation example, we’d need to know about the map, not just the location), we can define a Comparator to hold this additional state and use it in its compare method. This Comparator gets passed to the PriorityQueue constructor, and then we have a “greedy” search. This is basically a one-line change to the method, just like going from BFS to DFS.

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal,
    Comparator<V> comp) {
  Queue<V> frontier = new PriorityQueue<>(comp);
  frontier.add(start);

  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }
  return path;
}

Defining the comparator is problem-specific; we can assume it’s been passed in as above.

Greedy search can “get it wrong” and find a sub-optimal path, especially if the heuristic is inaccurate. As you’ll learn in later courses, an optimal informed search algorithm is called “A*”, and its f(x) = g(x) + h(x). g(x) is just the least-known cost to get to vertex x so far; h(x) is a heuristic that must obey certain conditions – an “admissible” heuristic. Again, you’ll see this in future courses.