Welcome
Announcements
Recall
The idea behind searching a graph is that we want to systematically examine it, starting at one point, looking for a path to another point. We do so by keeping track of a list of places to be explored (the “frontier”).
We start by marking the start location as seen and adding it to the frontier.
We then repeat the following steps until the frontier is empty or our goal is found:
- Pick and remove a location from the frontier.
- “Expand” the location by looking at its neighbors. Any neighbor we haven’t seen yet is added to the frontier; mark those locations as seen as you add them to the frontier.
What might this look like in code?
static <V> boolean isPath(UndirectedGraph<V> graph, V start, V goal) {
Queue<V> frontier = new LinkedList<>();
frontier.add(start);
Set<V> seen = new HashSet<>();
seen.add(start);
while (!frontier.isEmpty()) {
V current = frontier.remove();
if (current.equals(goal)) return true;
for (V next : graph.neighborsOf(current)) {
// note: could put check for goal here instead
if (!seen.contains(next)) {
frontier.add(next);
seen.add(next);
}
}
}
return false;
}
Note I used a Queue
here; this first in, first-out behavior enforces a breadth-first search. (on board) Queue
s are lists, but you can only add on one end, and only remove from the other; like waiting in line at Disney or some such. (You could also use a List
if you wanted to, but how you add and removes vertices from the frontier controls how the search runs.)
Using a Queue
, this search will visit all vertices adjacent to the start (that is, one hop away from the start) before it visits their neighbors (two hops away from the start), and so on, like ripples in a pond. This is called a “breadth-first” search.
Depending upon the order of vertices returned from our frontier
, the search will progress in different ways; most notably is a depth-first when the frontier is a stack – last in, first out. You’ll see this in more detail in 187.
In-class exercise
Suppose we have the following graph:
1--2
/ \
S G
\ /
3--4
Vertices are added to the frontier in numerical order as a node is explored, and explored in the order they were added.
Finding the path
isPath
doesn’t actually find the path, it just checks to see if there is one.
One way to find the path is to change seen
slightly.
Instead of keeping track only of whether or not a vertex has been seen, we can keep track of where we “came from” to get to that vertex. In other words, we can track the “predecessor” of that vertex. (on board)
Here’s the updated code:
static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal) {
Queue<V> frontier = new LinkedList<>();
frontier.add(start);
Map<V, V> predecessor = new HashMap<>();
predecessor.put(start, null);
List<V> path = new ArrayList<>();
while (!frontier.isEmpty()) {
V current = frontier.remove();
for (V next : graph.neighborsOf(current)) {
if (!predecessor.containsKey(next)) {
frontier.add(next);
predecessor.put(next, current);
}
}
if (current.equals(goal)) {
path.add(current);
V previous = predecessor.get(current);
while (previous != null) {
path.add(0, previous);
previous = predecessor.get(previous);
}
break;
}
}
return path;
}
As before, we could do the goal check inside the inner for
loop to save a few frontier expansions; I broke it out here to make it more clear, but either way works.
OK, great! What does this look like generally? Again, we search each vertex one hop away before we get to any of the vertices two hops away, and so on. This behavior, the choice of which vertices to search, is entirely a function of how we store and return vertices from the frontier. When it’s a queue, we get this “breadth-first”, ripples-in-a-pond behavior. You can imagine the form of the search a tree, where each level of the tree is the distance, in hops, from the start node. We search this tree level-by-level in a breadth first search. (on board)
The other way to search a graph is “depth-first” search, where we fully explore one branch before backtracking to the next.
BFS code
static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal) {
Queue<V> frontier = new LinkedList<>();
frontier.add(start);
Map<V, V> predecessor = new HashMap<>();
predecessor.put(start, null);
List<V> path = new ArrayList<>();
while (!frontier.isEmpty()) {
V current = frontier.remove();
for (V next : graph.neighborsOf(current)) {
if (!predecessor.containsKey(next)) {
frontier.add(next);
predecessor.put(next, current);
}
}
if (current.equals(goal)) {
path.add(current);
V previous = predecessor.get(current);
while (previous != null) {
path.add(0, previous);
previous = predecessor.get(previous);
}
break;
}
}
return path;
}
Example runs
How does this work on real graphs? Let’s do a few examples.
Simple (three node)
More complex example
In-class exercise
On efficiency
BFS always finds (one of) the paths to the goal that has the shortest number of hops, since it always searches all paths of n hops before searching paths of hop n-1. But it requires that you remember the entire search tree!
DFS, interestingly, does not need to remember the entire tree; only the vertices along a path along with their neighbors. Once a vertex has been seen it can be forgotten, if you’re willing to do some minor bookkeeping (less than is required in BFS, which requires tracking every previously seen node). But DFS might not find the shortest path path. Remember that if this were a depth-first search, we’d search the tree as far as possible down one path before backtracking. (on board) To get this behavior, all we need to do is switch the queue to a stack. (Correctly “forgetting” nodes to keep space requirements low is more complex code-wise to implement than just switching to a stack, though.)
Informed search
Both breadth- and depth-first search are said to be “uninformed” search. That is, the explore the frontier systematically, but with no knowledge of where the goal is relati ve to the current position. There’s only so much you can do to optimize them (there’s a hybrid algorithm called “iterative deepening DFS” that sorta gets you the best of both worlds; you might analyze this in more depth in 311 or 383).
But if you know something about the problem domain, you can do better.
For example, suppose you have a graph where the vertices represent places (say, on campus), and the edges represent paths between those places. Each edge has a cost associated with it (say, the distance), and you’re trying to find a least-cost (aka shortest) path between a start and a goal.
If you just pick edges according to BFS, you’ll find the shortest path hop-wise but not cost-wise. How might we do better?
Well, one way would be to order the frontier, from least-cost to highest-cost, and examine vertices in order from least-to-highest cost. Of course, we are probably working with estimates, since if we really knew the true cost, it wouldn’t be a search: we’d just follow the least-coast path like a homing missile.
How do estimate their costs? We say that each vertex’s cost is defined by a function f(x).
One definition for f(x) is a heuristic, say h(x). A heuristic is an “approximation” or guess of a true value. In our campus graph example, a heuristic might be the straight-line distance from the vertex in question to the goal; usually, paths are roughly straight lines, though of course buildings or the campus pond might make this heuristic not-quite-correct. We can compute this by looking at the map.
So, one approach is to do a “greedy” search, where we always choose the closest node. How can we do this? We could sort the frontier after each iteration of the loop, which would require time proportional to about (n log n), where n is the number of vertices, if we used an efficient sorting algorithm. And that’s fine. It would produce a “priority queue,” that is, a queue that returns things not in first-in, first-out order, but “best-out” order.
It turns out Java implements this for us.
Priority queues
The PriorityQueue
will act exactly as we want, allowing items to be added in arbitrary order, and returning them in lowest-cost order. Priority queues internally are not a List
, but instead a heap. We won’t implement heaps in this course (wait for it… you will in 187!). Heaps are “not-quite-sorted”; they maintain another property (the “heap property”) which lets them remove and return the current smallest item in (log n) time, and add new items in (log n) time.
In any event, we need to define an ordering on items to create a useful priority heap. Just like we’ve seen several times before, when there’s additional context we need to compare two items (for example, in our campus navigation example, we’d need to know about the map, not just the location), we can define a Comparator
to hold this additional state and use it in its compare
method. This Comparator
gets passed to the PriorityQueue
constructor, and then we have a “greedy” search. This is basically a one-line change to the method, just like going from BFS to DFS.
static <V> List<V> findPath(UndirectedGraph<V> graph, V start, V goal,
Comparator<V> comp) {
Queue<V> frontier = new PriorityQueue<>(comp);
frontier.add(start);
Map<V, V> predecessor = new HashMap<>();
predecessor.put(start, null);
List<V> path = new ArrayList<>();
while (!frontier.isEmpty()) {
V current = frontier.remove();
for (V next : graph.neighborsOf(current)) {
if (!predecessor.containsKey(next)) {
frontier.add(next);
predecessor.put(next, current);
}
}
if (current.equals(goal)) {
path.add(current);
V previous = predecessor.get(current);
while (previous != null) {
path.add(0, previous);
previous = predecessor.get(previous);
}
break;
}
}
return path;
}
Defining the comparator is problem-specific; we can assume it’s been passed in as above.
Greedy search can “get it wrong” and find a sub-optimal path, especially if the heuristic is inaccurate. As you’ll learn in later courses, an optimal informed search algorithm is called “A*”, and its f(x) = g(x) + h(x). g(x) is just the least-known cost to get to vertex x so far; h(x) is a heuristic that must obey certain conditions – an “admissible” heuristic. Again, you’ll see this in future courses.
Review of Graph Representations
Recall that we can represent the Graph ADT in any implementation we want: in this course, we’ll (briefly) sketch the Adjacency Matrix and Adjacency List implementations. Today we’ll present them at a high level, and next lecture we’ll go over them in more detail.
Consider a (very) simplified graph, where V = {0, 1, 2, … n-1}.
The adjacency matrix representation just creates an n x n 2D array of booleans, representing the edge from-to relationship. A given entry in the array is true
iff there exists an edge from-to the corresponding indices of the array.
The adjacency list representation is an array of lists. The array is n elements long; each element points to a list of outgoing edge destination corresponding to that element’s edges (or an empty list, if it has no outgoing edges).
(on board)
More next class!