CMPSCI 187: Programming With Data Structures ============================================ Today's topics -------------- - administrivia - heaps - search Administrivia ============= Reminders --------- Assignment 10 is due next Thursday at 8:30am. Heaps ===== Overview -------- A *heap* is a data structure that keeps some comparable elements in a semi-sorted state. The largest element is at the root of the tree, and larger elements will tend to be nearer the top of the tree. We’ll use a heap to implement a *priority queue*, where the important operations are to insert a new element and remove the largest element. (In fact we'll conflate the two somewhat here; see DJW for a separate description of the PQ interface.) Formally, a heap (in this case a max-heap) is a *complete tree* of comparable elements that satisfies the *heap property*: the element at every node is larger than (or equal to) both the elements at its children (if there are any). Hence that element must also be larger than the elements at any of its descendants. In particular, the largest element must be at the root. (In a min-heap, each node’s element is smaller than (or equal to) the elements at its children.) Remember that a complete binary tree has its leaves on one level or on two adjacent levels -- in the latter case the leaves on the upper level all exist and those on the lower level are left-justified. As we said, a heap is "somewhat sorted". We can find the maximum element quickly, but the farther down we go the less we know about the relative order of elements. Clicker question: Where in the heap? ------------------------------------ As we mentioned last lecture, we can implement a tree structure with an array by using implicit pointers: the left child of node i is node 2i + 1, and the right child is node 2i + 2. With this convention, an array of length n corresponds exactly to an n-node complete binary tree. Starting the code ----------------- ``` {.java} public class Heap> implements PriorityQueueInterface { private T[] elements; private int size; public Heap(int maxSize) { elements = (T[])(new Comparable[maxSize]); size = 0; } public boolean isEmpty() { return size == 0; } public boolean isFull() { return (size >= elements.length); } ``` Adding elements --------------- When we add an element to the heap, it becomes larger by one element. We know exactly where the new element must go, in the next available array slot, so we can add it there, but this could destroy the heap property.   We will fix the property by doing what DJW calls *reheaping up* and the rest of the world calls *bubbling up*. We think of the place for the new element as a *hole*. If putting the new element in the hole would violate the heap property, we move the parent element down into the hole and repeat the process with a new hole in the parent’s position. We continue this way until the new element may go in the hole, which might not happen until we reach the root. This means shifting up to O(log n) elements and takes O(log n) time in the worst case. The process of moving the hole upward is a good candidate for recursion (though the loop version is about as simple). We have a problem as long as the hole’s parent has a element smaller than the element we are trying to place, unless we’ve moved the hole to the root, where no element is too big. So we use the basic programming paradigm of “while there is a problem, do something about it”. Once we have a suitable place for the element, we just put it there. ``` {.java} public void enqueue(T element) throws PriorityQueueOverflowException { if (isFull()) { throw new PriorityQueueOverflowException(); } elements[size] = element; size += 1; reheapUp(size-1); } private void reheapUp(int hole) { if (hole <= 0) return; int parent = (hole - 1) / 2; if (elements[hole].compareTo(elements[parent]) > 0) { swap(hole, parent); reheapUp(parent); } } private void swap(int i1, int i2) { T temp = elements[i1]; elements[i1] = elements[i2]; elements[i2] = temp; } private void reheapUpIterative () { int hole = size - 1; int parent = (hole-1) / 2 T element = element[hole]; while ((hole > 0) && (element > elements[parent])) { swap(hole, parent); hole = parent; parent = (hole-1) / 2; } } ``` Clicker question: Adding to a heap ---------------------------------- Removing elements ----------------- Remember, we're using heaps to build a priority queue, so let's think about the `dequeue` operation. We know that we want to return the value in the zero index of the heap, but we need to fill the hole we create. We'll do so by replacing it with the *last* element in the heap. But this probably results in a tree that violates the heap property, so, like before, we fix the mess we've made. We do so by *reheaping down* (or as the rest of the world calls it, *tricking down*). We swap the value in the hole with the *larger* of its two children, and move the hole down, until the hole is larger than its children. ``` {.java} public T dequeue() throws PriorityQueueUnderflowException { if (isEmpty()) { throw new PriorityQueueUnderflowException(); } T value = elements[0]; size -= 1; elements[0] = elements[size]; if (! isEmpty()) { reheapDown(0); } return value; } private void reheapDown(int i) { int l = 2 * i + 1; int r = 2 * i + 2; if (l >= size) { return; } int big = l; if (r < size && elements[r].compareTo(elements[l]) > 0) { big = r; } if (elements[i].compareTo(elements[big]) < 0) { swap(big, i); reheapDown(big); } } ``` Note the various special cases we check for: an empty heap in `dequeue`, no children or just one child in `reheapDown`. Clicker question: Removing from a heap -------------------------------------- Priority queues and their uses ------------------------------ A priority queue is a collection where the elements come from an ordered type, and where the removal operation gives us not the newest element (as in a stack) or the oldest element (as in a queue) but the largest element according to the order. We might call this BIFO for “best-in-first-out”, as opposed to FIFO for a queue and LIFO for a stack. There are many situations where the next item in a list to tackle is the most important according to some measure. Operating systems have a relative priority on processes and favor the ones with highest priority in timesharing. Event-based simulations (including many games) need to know when the next event is going to occur. To-do lists. Etc. We can make a priority queue of items using a heap that stores objects that each have a number attached for their priority. The `compareTo` operation on these objects calls an item “larger” if it has higher priority. Why use a heap for a priority queue? ------------------------------------ If we used an unsorted list to implement a priority queue, we could enqueue in O(1) time but we would need O(N) to dequeue from a queue of size N. With an array-based sorted list, we could dequeue in O(1) time but would need O(N) to enqueue, because in the worst case we must move O(N) elements. With a reference-based sorted list, dequeuing is again O(1) time but now enqueueing requires O(N) time in the worst case to find the place to insert. In a binary search tree, both enqueueing and dequeueing require a trip from the root of the tree down to a leaf in the worst case. This is O(log N) time if the tree is balanced, but could be as bad as O(N) if it is not.   But we have achieved a guaranteed O(log N) time for both insertion and removal by using a heap, a tree structure that is always balanced after each operation. Clicker question: Searching a heap ---------------------------------- The natural setting for general search is *graphs*, which we’ll start looking at next week. But we can see a lot of the general principles of searching by looking at searches on a grid. Recall the rectangular grid of squares on which we marked blobs. We began with a recursive depth-first search, using the method stack to find all the squares on the same grid as our initial square. We then considered placing the squares being considered on a queue, for a breadth-first search. This had the advantage that when we found a path, that path was as short as possible, counting each move north, east, south, or west as being length 1. What if all moves to adjacent squares are not equally costly? For example, if the grid represents a map of the real world, we might have lowland squares that cost 1 to enter, hills that cost 2, and mountains that cost 3, and oceans that are impassible (cost infinity). Consider the following map: (projector) In addition to lowlands, hills, mountains, and oceans, we also have three cities, X, Y, and Z. How can we find the shortest path from X to each of Y and Z? One simple approach is to find the distance from X to each other square on the map (you'll learn more sophisticated approaches in 311 or 383). We'll do a breadth-first search, but instead of a simple queue, we'll use a priority queue, where the priority is the distance from X so far. We enqueue the location of X (0, 6) and its priority (0) as (0, 0, 6). We mark it as visited. Now we dequeue it, and add each of its neighbors to the queue. Each has priority of the path length so far (the priority of the node we just dequeued) plus its own cost: (3, 0, 5) <- mountains to left (1, 0, 7) <- lowlands to right (2, 6, 1) <- hills below Now we repeat our dequeue / visit / enqueue unvisited neighbors, until we know the cost of the shortest path to each node. This is called a "min-cost search", a variant of "breadth first search". dequeue (1,0,7), mark as visited enqueue (2,0,8), (3,1,7) queue now contains: (3, 0, 5) (2, 6, 1) (2, 0, 8) (3, 1, 7) dequeue (2,6,1) (ties broken based upon queue implementation) enqueue (4,1,5), (3,6,2), (4,1,7) dequeue (2,0,8) enqueue (1,8,3) and so on. In this way we get an expanding area of nodes for which we know the distance from X. When this area includes our targets Y and Z, we will be done. If there are multiple paths from X to a node, the item for the shortest path will come off the priority first. (on board: rectangle from 0,6 to 2,7) Clicker question: min-cost search =================================