Lecture 18: More Search

Announcements

Quiz grades should be posted soon.

Quiz question

The thing people had most trouble with: handling the MultiMap semantics. Here are two examples:

static <K, V> void addToMultiMapOneWay(Map<K, List<V>> map, K key, V value) {
  List<V> list = map.getOrDefault(key, new ArrayList<>());
  map.put(key, list);
  list.add(value);      
}

static <K, V> void addToMultiMapAnotherWay(Map<K, List<V>> map, K key, V value) {
  if (!map.containsKey(key)) {
    map.put(key, new ArrayList<>());
  }
  List<V> list = map.get(key);
  list.add(value);      
}

Note that in either case, we need to make sure there is a list associated with the key in the map before we append the value to that map. In the first example, we create the list if we need to using getOrDefault, then make sure it’s in the map by putting it. In the second, we check if the list is in the map (using containsKey), creating it and putting it if necessary, then add the value to it.

Search and pathfinding

Last class we wrote isPath, to check if a path exists from a start to a goal.

isPath doesn’t actually find the path. One way to find the path is to change visited slightly.

Instead of keeping track only of whether or not a vertex has been visited, we can keep track of where we “came from” to get to that vertex. In other words, we can track the “predecessor” of that vertex. (on board)

Here’s the updated code:

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal) {
  Queue<V> frontier = new LinkedList<>();
  frontier.add(start);


  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }     
  return path;
}

(Note that we can do the goal check inside the inner for loop to save a few frontier expansions; I broke it out here to make it more clear, but either way works.)

Let’s work through an example: (on board, graph 1,2,3,4 where 1,2,3 are strongly connected and 4 is only connected to 3).

OK, great! What does this look like generally? Again, we search each vertex one hop away before we get to any of the vertices two hops away, and so on. This behavior, the choice of which vertices to search, is entirely a function of how we store and return vertices from the frontier. When it’s a queue, we get this “breadth-first”, ripples-in-a-pond behavior. You can imagine the form of the search a tree, where each level of the tree is the distance, in hops, from the start node. We search this tree level-by-level in a breadth first search. (on board)

On efficiency

BFS always finds (one of) the paths to the goal that has the shortest number of hops, since it always searches all paths of n hops before searching paths of hop n-1. But it requires that you remember the entire search tree!

DFS, interestingly, does not need to remember the entire tree; only the vertices along a path along with their neighbors. Once a vertex has been visited it can be forgotten, if you’re willing to do some minor bookkeeping (less than is required in BFS, which requires tracking every previously visited node). But DFS might not find the shortest path path. Remember that if this were a depth-first search, we’d search the tree as far as possible down one path before backtracking. (on board) To get this behavior, all we need to do is switch the queue to a stack. (Correctly “forgetting” nodes to keep space requirements low is more complex code-wise to implement than just switching to a stack, though.)

Informed search

Both breadth- and depth-first search are said to be “uninformed” search. That is, the explore the frontier systematically, but with no knowledge of where the goal is relative to the current position. There’s only so much you can do to optimize them (there’s a hybrid algorithm called “iterative deepening DFS” that sorta gets you the best of both worlds; you might analyze this in more depth in 311 or 383).

But if you know something about the problem domain, you can do better.

For example, suppose you have a graph where the vertices represent places (say, on campus), and the edges represent paths between those places. Each edge has a cost associated with it (say, the distance), and you’re trying to find a least-cost (aka shortest) path between a start and a goal.

If you just pick edges according to BFS, you’ll find the shortest path hop-wise but not cost-wise. How might we do better?

Well, one way would be to order the frontier, from least-cost to highest-cost, and examine vertices in order from least-to-highest cost. Of course, we are probably working with estimates, since if we really knew the true cost, it wouldn’t be a search: we’d just follow the least-coast path like a homing missile.

How do estimate their costs? We say that each vertex’s cost is defined by a function f(x).

One definition for f(x) is a heuristic, say h(x). A heuristic is an “approximation” or guess of a true value. In our campus graph example, a heuristic might be the straight-line distance from the vertex in question to the goal; usually, paths are roughly straight lines, though of course buildings or the campus pond might make this heuristic not-quite-correct. We can compute this by looking at the map.

So, one approach is to do a “greedy” search, where we always choose the closest node. How can we do this? We could sort the frontier after each iteration of the loop, which would require time proportional to about (n log n), where n is the number of vertices, if we used an efficient sorting algorithm. And that’s fine. It would produce a “priority queue,” that is, a queue that returns things not in first-in, first-out order, but “best-out” order.

It turns out Java implements this for us.

Priority queues

The PriorityQueue will act exactly as we want, allowing items to be added in arbitrary order, and returning them in lowest-cost order. Priority queues internally are not a List, but instead a heap. We won’t implement heaps in this course (wait for it… you will in 187!). Heaps are “not-quite-sorted”; they maintain another property (the “heap property”) which lets them remove and return the current smallest item in (log n) time, and add new items in (log n) time.

In any event, we need to define an ordering on items to create a useful priority heap. Just like we’ve seen several times before, when there’s additional context we need to compare two items (for example, in our campus navigation example, we’d need to know about the map, not just the location), we can define a Comparator to hold this additional state and use it in its compare method. This Comparator gets passed to the PriorityQueue constructor, and then we have a “greedy” search. This is basically a one-line change to the method, just like going from BFS to DFS.

static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal,
    Comparator<V> comp) {
  Queue<V> frontier = new PriorityQueue<>(comp);
  frontier.add(start);

  Map<V, V> predecessor = new HashMap<>();
  predecessor.put(start, null);

  List<V> path = new ArrayList<>();

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    for (V next : graph.neighborsOf(current)) {
      if (!predecessor.containsKey(next)) {
        frontier.add(next);
        predecessor.put(next, current);
      }
    }
    if (current.equals(goal)) {
      path.add(current);
      V previous = predecessor.get(current);
      while (previous != null) {
        path.add(0, previous);
        previous = predecessor.get(previous);
      }
      break;
    }
  }
  return path;
}

Defining the comparator is problem-specific; we can assume it’s been passed in as above.

Greedy search can “get it wrong” and find a sub-optimal path, especially if the heuristic is inaccurate. As you’ll learn in later courses, an optimal informed search algorithm is called “A*”, and its f(x) = g(x) + h(x). g(x) is just the least-known cost to get to vertex x so far; h(x) is a heuristic that must obey certain conditions – an “admissible” heuristic. Again, you’ll see this in future courses.