CMPSCI 187: Programming With Data Structures
============================================

Today's topics
--------------

-   administrivia
-   heaps
-   search

Administrivia
=============

Reminders
---------

Assignment 10 is due next Thursday at 8:30am.

Heaps
=====

Overview
--------

A *heap* is a data structure that keeps some comparable elements in a
semi-sorted state. The largest element is at the root of the tree, and
larger elements will tend to be nearer the top of the tree.

We’ll use a heap to implement a *priority queue*, where the important
operations are to insert a new element and remove the largest element.
(In fact we'll conflate the two somewhat here; see DJW for a separate
description of the PQ interface.)

Formally, a heap (in this case a max-heap) is a *complete tree* of
comparable elements that satisfies the *heap property*: the element at
every node is larger than (or equal to) both the elements at its
children (if there are any).

Hence that element must also be larger than the elements at any of its
descendants. In particular, the largest element must be at the root. (In
a min-heap, each node’s element is smaller than (or equal to) the
elements at its children.)

Remember that a complete binary tree has its leaves on one level or on
two adjacent levels -- in the latter case the leaves on the upper level
all exist and those on the lower level are left-justified.

As we said, a heap is "somewhat sorted". We can find the maximum element
quickly, but the farther down we go the less we know about the relative
order of elements.

Clicker question: Where in the heap?
------------------------------------

As we mentioned last lecture, we can implement a tree structure with an
array by using implicit pointers: the left child of node i is node 2i +
1, and the right child is node 2i + 2.

With this convention, an array of length n corresponds exactly to an
n-node complete binary tree.

Starting the code
-----------------

``` {.java}
public class Heap<T extends Comparable<T>>
    implements PriorityQueueInterface<T>
{
  private T[] elements;
  private int size;

  public Heap(int maxSize) {
    elements = (T[])(new Comparable[maxSize]);
    size = 0;
  }

  public boolean isEmpty() {
    return size == 0;
  }

  public boolean isFull() {
    return (size >= elements.length);
  }
```

Adding elements
---------------

When we add an element to the heap, it becomes larger by one element.

We know exactly where the new element must go, in the next available
array slot, so we can add it there, but this could destroy the heap
property.   We will fix the property by doing what DJW calls *reheaping
up* and the rest of the world calls *bubbling up*.

We think of the place for the new element as a *hole*. If putting the
new element in the hole would violate the heap property, we move the
parent element down into the hole and repeat the process with a new hole
in the parent’s position.

We continue this way until the new element may go in the hole, which
might not happen until we reach the root. This means shifting up to
O(log n) elements and takes O(log n) time in the worst case.

The process of moving the hole upward is a good candidate for recursion
(though the loop version is about as simple).

We have a problem as long as the hole’s parent has a element smaller
than the element we are trying to place, unless we’ve moved the hole to
the root, where no element is too big.

So we use the basic programming paradigm of “while there is a problem,
do something about it”. Once we have a suitable place for the element,
we just put it there.

``` {.java}
  public void enqueue(T element) throws PriorityQueueOverflowException {
    if (isFull()) {
      throw new PriorityQueueOverflowException();
    }
    elements[size] = element;
    size += 1;
    reheapUp(size-1);
  }

  private void reheapUp(int hole) {
    if (hole <= 0) return;
    int parent = (hole - 1) / 2;
    if (elements[hole].compareTo(elements[parent]) > 0) {
      swap(hole, parent);
      reheapUp(parent);
    }
  }
  private void swap(int i1, int i2) {
    T temp = elements[i1];
    elements[i1] = elements[i2];
    elements[i2] = temp;
  }
  
  private void reheapUpIterative () {
    int hole = size - 1;
    int parent = (hole-1) / 2
    T element = element[hole];
    while ((hole > 0) && 
           (element > elements[parent])) {
      swap(hole, parent);
      hole = parent;
      parent = (hole-1) / 2;
    }    
  }
```

Clicker question: Adding to a heap
----------------------------------

Removing elements
-----------------

Remember, we're using heaps to build a priority queue, so let's think
about the `dequeue` operation. We know that we want to return the value
in the zero index of the heap, but we need to fill the hole we create.

We'll do so by replacing it with the *last* element in the heap. But
this probably results in a tree that violates the heap property, so,
like before, we fix the mess we've made. We do so by *reheaping down*
(or as the rest of the world calls it, *tricking down*). We swap the
value in the hole with the *larger* of its two children, and move the
hole down, until the hole is larger than its children.

``` {.java}

  public T dequeue()
      throws PriorityQueueUnderflowException {
    if (isEmpty()) {
      throw new PriorityQueueUnderflowException();
    }
    T value = elements[0];
    size -= 1;
    elements[0] = elements[size];
    if (! isEmpty()) { 
      reheapDown(0);
    }
    return value;
  }

  private void reheapDown(int i) {
    int l = 2 * i + 1;
    int r = 2 * i + 2;
    if (l >= size) { return; }
    int big = l;
    if (r < size &&
        elements[r].compareTo(elements[l]) > 0)
      {
        big = r;
      }
    if (elements[i].compareTo(elements[big]) < 0) {
      swap(big, i);
      reheapDown(big);
    }
  }
```

Note the various special cases we check for: an empty heap in `dequeue`,
no children or just one child in `reheapDown`.

Clicker question: Removing from a heap
--------------------------------------

Priority queues and their uses
------------------------------

A priority queue is a collection where the elements come from an ordered
type, and where the removal operation gives us not the newest element
(as in a stack) or the oldest element (as in a queue) but the largest
element according to the order.

We might call this BIFO for “best-in-first-out”, as opposed to FIFO for
a queue and LIFO for a stack.

There are many situations where the next item in a list to tackle is the
most important according to some measure. Operating systems have a
relative priority on processes and favor the ones with highest priority
in timesharing. Event-based simulations (including many games) need to
know when the next event is going to occur. To-do lists. Etc.

We can make a priority queue of items using a heap that stores objects
that each have a number attached for their priority. The `compareTo`
operation on these objects calls an item “larger” if it has higher
priority.

Why use a heap for a priority queue?
------------------------------------

If we used an unsorted list to implement a priority queue, we could
enqueue in O(1) time but we would need O(N) to dequeue from a queue of
size N.

With an array-based sorted list, we could dequeue in O(1) time but would
need O(N) to enqueue, because in the worst case we must move O(N)
elements.

With a reference-based sorted list, dequeuing is again O(1) time but now
enqueueing requires O(N) time in the worst case to find the place to
insert.

In a binary search tree, both enqueueing and dequeueing require a trip
from the root of the tree down to a leaf in the worst case. This is
O(log N) time if the tree is balanced, but could be as bad as O(N) if it
is not.   But we have achieved a guaranteed O(log N) time for both
insertion and removal by using a heap, a tree structure that is always
balanced after each operation.

Clicker question: Searching a heap
----------------------------------

The natural setting for general search is *graphs*, which we’ll start
looking at next week. But we can see a lot of the general principles of
searching by looking at searches on a grid.

Recall the rectangular grid of squares on which we marked blobs. We
began with a recursive depth-first search, using the method stack to
find all the squares on the same grid as our initial square.

We then considered placing the squares being considered on a queue, for
a breadth-first search. This had the advantage that when we found a
path, that path was as short as possible, counting each move north,
east, south, or west as being length 1.

What if all moves to adjacent squares are not equally costly? For
example, if the grid represents a map of the real world, we might have
lowland squares that cost 1 to enter, hills that cost 2, and mountains
that cost 3, and oceans that are impassible (cost infinity).

Consider the following map: (projector)

In addition to lowlands, hills, mountains, and oceans, we also have
three cities, X, Y, and Z. How can we find the shortest path from X to
each of Y and Z?

One simple approach is to find the distance from X to each other square
on the map (you'll learn more sophisticated approaches in 311 or 383).

We'll do a breadth-first search, but instead of a simple queue, we'll
use a priority queue, where the priority is the distance from X so far.

We enqueue the location of X (0, 6) and its priority (0) as (0, 0, 6).

We mark it as visited.

Now we dequeue it, and add each of its neighbors to the queue. Each has
priority of the path length so far (the priority of the node we just
dequeued) plus its own cost:

    (3, 0, 5)  <- mountains to left
    (1, 0, 7)  <- lowlands to right
    (2, 6, 1)  <- hills below

Now we repeat our dequeue / visit / enqueue unvisited neighbors, until
we know the cost of the shortest path to each node. This is called a
"min-cost search", a variant of "breadth first search".

dequeue (1,0,7), mark as visited

enqueue (2,0,8), (3,1,7)

queue now contains:

    (3, 0, 5)
    (2, 6, 1)
    (2, 0, 8)
    (3, 1, 7)

dequeue (2,6,1) (ties broken based upon queue implementation)

enqueue (4,1,5), (3,6,2), (4,1,7)

dequeue (2,0,8)

enqueue (1,8,3)

and so on. In this way we get an expanding area of nodes for which we
know the distance from X. When this area includes our targets Y and Z,
we will be done.

If there are multiple paths from X to a node, the item for the shortest
path will come off the priority first.

(on board: rectangle from 0,6 to 2,7)

Clicker question: min-cost search
=================================
