18: Graph representation

DRAFT

Announcements

Talk TODAY on Forensic Investigation of Internet-based Crimes Against Children

4:30, ILC N255, pizza will be served. See Piazza for more details.

Graphs

Recall that we can represent the Graph ADT in any implementation we want: in this course, we’ll (briefly) sketch the Adjacency Matrix and Adjacency List implementations.

Consider a (very) simplified graph, where V = {0, 1, 2, … n-1}.

The adjacency matrix representation just creates an n x n 2D array of booleans, representing the edge from-to relationship. A given entry in the array is true iff there exists an edge from-to the corresponding indices of the array.

The adjacency list representation is an array of lists. The array is n elements long; each element points to a list of outgoing edge destination corresponding to that element’s edges (or an empty list, if it has no outgoing edges).

(on board)

Here’s a naive implementation:

public class AdjacencyMatrixUndirectedGraph<V> implements UndirectedGraph<V> {

    private List<V> vertices;
    private final boolean[][] edges;

    public AdjacencyMatrixUndirectedGraph(int maxVertices) {
        vertices = new ArrayList<>();
        edges = new boolean[maxVertices][maxVertices];
    }


    @Override
    public void addVertex(V v) {
        // what if the vertex is already in the graph?
        vertices.add(v);
    }

    @Override
    public boolean hasVertex(V v) {     
        return vertices.contains(v);
    }

    @Override
    public Set<V> vertices() {
        return new HashSet<>(vertices);
    }

    @Override
    public void addEdge(V u, V v) {
        // order of edges?
        // u,v in graph?
        edges[vertices.indexOf(u)][vertices.indexOf(v)] = true;
    }

    @Override
    public boolean hasEdge(V u, V v) {
        // order of edges?
        // u,v in graph?
        return edges[vertices.indexOf(u)][vertices.indexOf(v)];
    }

    @Override
    public Set<V> neighborsOf(V v) {
        // order of edges?
        // v in graph?
        Set<V> neighbors = new HashSet<>();
        int index = vertices.indexOf(v);
        for (int i = 0; i < vertices.size(); i++) {
            if (edges[index][i]) {
                neighbors.add(vertices.get(i));
            }
        }
        return neighbors;
    }
}

Note that upon reflection, there are some problems here (repeated vertices! order of vertices in edges! are vertices even in the graph?). Some of this we can fix in code (by having, say, a canonical ordering, or being sure to set both spots in the matrix); some of this implies we need to add to our API (methods that take arbitrary vertices as parameters should throw an exception).

In class exercise

What is the running time of hasEdge?

How much space does the above implementation require, in terms of vertices or edges?

Remember, the main advantage of adjacency matrices is that they’re lightning fast in terms of checking if an edge is in the graph; it’s not just constant time, it’s constant time with a very low constant. Except our crappy implementation above requires a call to List.indexOf first; so it’s actually linear in the number of vertices. But a highly-optimized version of an adjacency matrix representation of a graph would not do this (it would instead use just ints for vertices) and would be “supah-fast”.

Adjacency List

The main downside to adjacency matrices is that they consume a lot of space: the implementation above uses (maxVertices)^2 space, that is, space quadratic in the number of vertices. In the worst case, a graph actually needs this much space – an “almost-complete” graph is called a “dense” graph. But if most vertices are not connected to most other vertices, that is, if we have a “sparse” graph, a more efficient implementation is the adjacency list.

Let’s write one now using our by-now old friend the Map:

public class AdjacenyListUndirectedGraph<V> implements UndirectedGraph<V> {
    Map<V, List<V>> adjacencyList;

    public AdjacenyListUndirectedGraph() {
        adjacencyList = new HashMap<>();
    }

    @Override
    public void addVertex(V v) {
        // duplicate vertex?
        adjacencyList.put(v, new ArrayList<>());
    }

    @Override
    public boolean hasVertex(V v) {
        return adjacencyList.containsKey(v);
    }

    @Override
    public Set<V> vertices() {
        // modification?
        return adjacencyList.keySet();
    }

    @Override
    public void addEdge(V u, V v) {
        // order?
        // u, v in adjacencyList?
        adjacencyList.get(u).add(v);
    }

    @Override
    public boolean hasEdge(V u, V v) {
        return adjacencyList.get(u).contains(v);
    }

    @Override
    public Set<V> neighborsOf(V v) {
    return new HashSet<>(adjacencyList.get(v));
    }
}

Again some problems here, including that we need to be careful of returning Sets that share structure with the graph. The caller might mutate the Set, and thus change the graph! If that’s not what we want (and it usually isn’t), then we should return copies of the structures that represent parts of the graph, not the original structures themselves.

In class exercise 2

What is the running time of hasEdge?

How much space does the above implementation require, in terms of vertices and/or edges?

Is this “slower” than an adjacency matrix? Yes. In particular, any time we need to iterate over the list (contains), we are, worst case, linear in the number of vertices. But we only need exactly as much space as is required to store each edge/vertex. In the worst case this is quadratic in the number of vertices, so we’re no better off than an adjacency matrix. But in a sparse graph, we come out ahead space-wise. And, saying a graph is sparse is roughly equivalent to saying that each vertex has a small constant number of edges, so contains is usually OK in this case. (You’ll explore this more in 311).

“But Marc,” you might be thinking, “why not make it a Map<V, Set<V>> and get the best of both worlds?” You can! And you would (mostly!). But while hash lookups are constant time, they’re not as quite as small a constant as array lookups. If you’re really, really worried about speed, and space is not an issue, you may end up using the adjacency matrix representation anyway. But enough about that.