Lecture 17: Graph Implementation and Search

Announcements

Gaming the autograder

(Apologies for spending lecture time on this, but it’s important, even though it only applies to a few people.)

“Gaming the autograder,” as discussed numerous times in lecture, is writing code that does not implement the required behavior of an assignment, but instead only works with the examples used by the autograder to test required behavior. It typically appears as a series of if/then/else statements that are tailored to the tests.

Don’t do this. The tests show you (some) errors in your code, but passing the tests is not indicative that your code is correct, only that it works on a finite set of examples in a problem space that is likely vast. Correct code works on most (or all) possible inputs, not just exactly the inputs we provide. We provide the test cases to help you learn, not to help you game the grading system.

For example, if you were required to write an addition method, and you knew the test cases were 2+3=5 and 4+5=9, a method that gamed the autograder would look like:

int addition(int x, int y) {
 if (x == 2 && y == 3) return 5;
 if (x == 4 && y == 5) return 9;
 return 0;
}

This method would work for the examples, but clearly does not correctly implement addition. It is incorrect, despite handling the two examples correctly. Further, if it were the final submission a student made, I would start the academic honesty process with that student.

I stopped checking for this on a previous assignment, as doing so is laborious and time-consuming, and it appears some students took that to mean that it was OK. It is not OK. I have told you repeatedly not to submit this sort of code, because it doesn’t satisfy the assignment directions, and because it is dishonest. Your final submission for this and all future assignments should not be of this form, otherwise we will start the academic honesty process with you.

Quizzes

Any questions? If so I’ll go over them.

Programming Assignment 08

Any questions? If so I’ll go over them.

Graphs

Implementing the abstraction

Remember from last class: there are two basic ways to implement the graph abstraction. One is based upon arrays and is known as the “adjacency matrix” representation; the other is based upon lists and is known as the “adjacently list” representation.

(on board)

Here’s a naive implementation:

public class AdjacencyMatrixUndirectedGraph<V> implements UndirectedGraph<V> {

    private List<V> vertices;
    private final boolean[][] edges;

    public AdjacencyMatrixUndirectedGraph(int maxVertices) {
        vertices = new ArrayList<>();
        edges = new boolean[maxVertices][maxVertices];
    }


    @Override
    public void addVertex(V v) {
        // what if the vertex is already in the graph?
        vertices.add(v);
    }

    @Override
    public boolean hasVertex(V v) {     
        return vertices.contains(v);
    }

    @Override
    public Set<V> vertices() {
        return new HashSet<>(vertices);
    }

    @Override
    public void addEdge(V u, V v) {
        // order of edges?
        // u,v in graph?
        edges[vertices.indexOf(u)][vertices.indexOf(v)] = true;
    }

    @Override
    public boolean hasEdge(V u, V v) {
        // order of edges?
        // u,v in graph?
        return edges[vertices.indexOf(u)][vertices.indexOf(v)];
    }

    @Override
    public Set<V> neighborsOf(V v) {
        // order of edges?
        // v in graph?
        Set<V> neighbors = new HashSet<>();
        int index = vertices.indexOf(v);
        for (int i = 0; i < vertices.size(); i++) {
            if (edges[index][i]) {
                neighbors.add(vertices.get(i));
            }
        }
        return neighbors;
    }
}

Note that upon reflection, there are some problems here (repeated vertices! order of vertices in edges! are vertices even in the graph?). Some of this we can fix in code (by having, say, a canonical ordering, or being sure to set both spots in the matrix); some of this implies we need to add to our API (methods that take arbitrary vertices as parameters can throw an exception).

In class exercise 2

What is the running time of hasEdge?

How much space does the above implementation require, in terms of vertices or edges?

Remember, the main advantage of adjacency matrices is that they’re lightning fast in terms of checking if an edge is in the graph; it’s not just constant time, it’s constant time with a very low constant. Except our crappy implementation above requires a call to List.indexOf first; so it’s actually linear in the number of vertices. But a highly-optimized version of an adjacency matrix representation of a graph would not do this (it would instead use just ints for vertices) and would be “supah-fast”.

The main downside to adjacency matrices is that they consume a lot of space: the implementation above uses (maxVertices)^2 space, that is, space quadratic in the number of vertices. In the worst case, a graph actually needs this much space – an “almost-complete” graph is called a “dense” graph. But if most vertices are not connected to most other vertices, that is, if we have a “sparse” graph, a more efficient implementation is the adjacency list.

Let’s write one now using our by-now old friend the Map:

public class AdjacenyListUndirectedGraph<V> implements UndirectedGraph<V> {
    Map<V, List<V>> adjacencyList;

    public AdjacenyListUndirectedGraph() {
        adjacencyList = new HashMap<>();
    }

    @Override
    public void addVertex(V v) {
        // duplicate vertex?
        adjacencyList.put(v, new ArrayList<>());
    }

    @Override
    public boolean hasVertex(V v) {
        return adjacencyList.containsKey(v);
    }

    @Override
    public Set<V> vertices() {
        // modification?
        return adjacencyList.keySet();
    }

    @Override
    public void addEdge(V u, V v) {
        // order?
        // u, v in adjacencyList?
        adjacencyList.get(u).add(v);
    }

    @Override
    public boolean hasEdge(V u, V v) {
        return adjacencyList.get(u).contains(v);
    }

    @Override
    public Set<V> neighborsOf(V v) {
    return new HashSet<>(adjacencyList.get(v));
    }
}

Again some problems here, including that we need to be careful of returning Sets that share structure with the graph. The caller might mutate the Set, and thus change the graph! If that’s not what we want (and it usually isn’t), then we should return copies of the structures that represent parts of the graph, not the original structures themselves.

In class exercise 2

What is the running time of hasEdge?

How much space does the above implementation require, in terms of vertices and/or edges?

Is this “slower” than an adjacency matrix? Yes. In particular, any time we need to iterate over the list (contains), we are, worst case, linear in the number of vertices. But we only need exactly as much space as is required to store each edge/vertex. In the worst case this is quadratic in the number of vertices, so we’re no better off than an adjacency matrix. But in a sparse graph, we come out ahead space-wise. And, saying a graph is sparse is roughly equivalent to saying that each vertex has a small constant number of edges, so contains is usually OK in this case. (You’ll explore this more in 311).

“But Marc,” you might be thinking, “why not make it a Map<V, Set<V>> and get the best of both worlds?” You can! And you will (mostly!). But while hash lookups are constant time, they’re not as quite as small a constant as array lookups. If you’re really, really worried about speed, and space is not an issue, you may end up using the adjacency matrix representation anyway. But enough about that.

More on searching graphs

Several people asked me to go over this again, so here goes, in a slightly different format.

The idea behind searching a graph is that we want to systematically examine it, starting at one point, looking for a path to another point. We do so by keeping track of a list of places to be explored (the “frontier”). We repeat the following steps until the frontier is empty or our goal is found:

  • Pick and remove a location from the frontier.
  • Mark the location as explored (visited) so we don’t “expand” it again.
  • “Expand” the location by looking at its neighbors. Any neighbor we haven’t seen yet (not visited, not already on the frontier) is added to the frontier.

What might this look like in code?

static <V> boolean isPath(UndirectedGraph<V> graph, V start, V goal) {
  Queue<V> frontier = new LinkedList<>();
  frontier.add(start);

  Set<V> visited = new HashSet<>();
  visited.add(start);

  while (!frontier.isEmpty()) {
    V current = frontier.remove();
    if (current.equals(goal)) return true;
    for (V next : graph.neighborsOf(current)) {
      // note: could put check for goal here instead
      if (!visited.contains(next)) {
        frontier.add(next);
        visited.add(next);
      }
    }
  }
  return false;
}

Note I used a Queue here; this first in, first-out behavior enforces a breadth-first search. (on board) Queues are lists, but you can only add on one end, and only remove from the other; like waiting in line at Disney or some such.

In other words, this search will visit all vertices adjacent to the start (that is, one hop away from the start) before it visits their neighbors (two hops away from the start), and so on, like ripples in a pond. Depending upon the order of vertices returned from our frontier, the search will progress in different ways; most notably is a depth-first when the frontier is a stack – last in, first out.