Week 10: Implementing ADTs (stacks and queues)

Specialized linear ADTs

The “standard” linear ADTs (in Java) are the array and the (generic) List. Arrays are a simple type, with very fast random access but the limitation of fixed sizes. Lists are more flexible, and their underlying implementation is generally written in terms of (resizing) arrays or (sometimes) in terms of a linked list.

But as we’ve mentioned, there are other linear data structures that one might use; they are similar to lists but restrict themselves in various ways. We’re going to revisit them so you’re ready when you see them again (“for the first time”) in 187. We’ll start with behavior, then do implementations.

Stacks

Stacks are a last-in, first-out data structure. They are like lists, but instead of allowing for random access (via get, remove, add), they restrict a user to adding items on one end (the “top”) and removing from that same position. These operations are sometimes called push (add an item), pop (remove an item), and peek (look at but do not remove the top item).

Modern Java suggests we use the Deque interface, which is short for double-ended queue, and use the addFirst, removeFirst, and peekFirst methods. In either case, though, the behavior is the same, LIFO.

s.push("a");
s.push("b");
s.pop();
s.push("c");
s.push("d");
s.peek();

(top on right)

After the first operation, the stack contains [“a”]
After the second, the stack contains [“a”, “b”].
removes and returns “b”, then stack contains, [“a”]
[“a”, “c”]
[“a”, “c”, “d”]
peek returns “d”

Queues

Queues are a first-in, first-out data structure. Java has a Queue interface you can use, or you can (also) use Deque, as described in its documentation. In a Queue, we typically talk about add (always at one end) remove (always from the other), and sometimes peek (just like a stack, returns but does not remove the next element that remove would return).

q.add("a");
q.add("b");
q.remove();
q.add("c");
q.add("d");
q.peek();

(front on left, rear on right)

[“a”]
[“a”, “b”]
removes and returns “a”, queue contains [“b”]
[“b”, “c”]
[“b”, “c”, “d”]
returns “b”

A side note: over/underflow

Stacks and queues can underflow. If you call pop or remove on an empty stack/queue, this will generate an exception.

Some stacks and queues are bounded, which means they have an explicit capacity. If you try to push or add to a stack/queue that is already at capacity, then you will overflow the structure and generate an exception.

Priority queues

A priority queue is like a queue, but it returns the next “smallest” (in Java) thing, rather than the first-in thing, when remove or peek is called.

It’s important to note that the exact order of the items stored in the priority queue is not visible to the user; you can only see the “next” / “top” item (that will be returned by peek or remove). Internally, priority queues are implemented as “heaps”, which are a tree-based structure similar to, but different from, the binary search trees we talked about briefly earlier this semester. Heaps allow for efficient (log n) insertion and removal of the smallest item.

How do we define “smallest”? The usual way, by either depending upon the “natural” ordering of the elements stored in the PriorityQueue<E> (that is, they must implement Comparable) or by passing in a Comparator when constructing the PriorityQueue.

Suppose then we do the following with a priority queue:

pq.add("b");
pq.add("a");
pq.remove();
pq.add("c");
pq.add("d");
pq.peek();

[“b”]
[“a”, “b”]
removes and returns “a”, contents are [“b”]
[“b”, “c”]
[“b”, “c”, “d”] ; note we don’t know whether “c” or “d” comes first; all we know is “b” is up next to be removed
returns “b”

Implementing a stack

The stack, a last-in, first-out data structure. How might we go about building one? Today we’ll build two implementations of a stack, based upon the ADT specified by a Java interface.

As we go through this, see how well you can follow along. This is the first of several “barometer” tasks we’ll be doing: if you find them easy (or at least, doable), then you should feel reasonable optimistic about 187 next semester. If you find them difficult or confusing, again, maybe have a fallback plan (or plan to study hard over the break, perhaps practicing on 186 assignments you had trouble with).

The interface

package linearadts;

public interface Stack<E> {
	void push(E e) throws StackOverflowException;
	E pop() throws StackUnderflowException;
	E peek() throws StackUnderflowException;
	boolean isEmpty();
	boolean isFull();
	int size();
}

This is a minimal interface; of course you can add more if you like. Note we include thrown exceptions, which is optional depending upon whether the exceptions are “checked” or “unchecked”; the former inherit from Exception and the latter from RuntimeException. See https://docs.oracle.com/javase/tutorial/essential/exceptions/runtime.html for more details.

We’ll trivially define (empty) classes for each exception ourselves.

Array-based stack

Now let’s look at an array-based implementation of a stack. We’ll do a “bounded” stack, that is, a stack that has a fixed maximum capacity. (You can imagine an unbounded, growable array, like the one we talked about for lists earlier in the semester, but we’ll skip that for now.)

What other state, that is, instance variables, does the stack require, other than an array? An index into that array, that indicates where the top of the stack “lives”. You have some discretion here – it could index the current top of the stack, or it could index the spot where the next top will be. This is a stylistic choice, but I find pointing it to the current top (or -1, if the stack is empty) to be more natural. Up to you.

(on board)

package linearadts;

public class BoundedArrayStack<E> implements Stack<E> {
	private E[] array;
	private int top;	
	
	public BoundedArrayStack(int capacity) {
		top = -1;
		array = (E[]) new Object[capacity]; // references to this array must not escape this class
	}
	
	@Override
	public void push(E e) throws StackOverflowException {
		if (isFull()) {
			throw new StackOverflowException();
		}
		top += 1;
		array[top] = e;
	}

	@Override
	public E pop() throws StackUnderflowException {
		if (isEmpty()) {
			throw new StackUnderflowException();
		}
		E temp = array[top];
		array[top] = null;
		top--;
		return temp;
	}

	@Override
	public E peek() throws StackUnderflowException {
		if (isEmpty()) {
			throw new StackUnderflowException();
		}
		return array[top];
	}

	@Override
	public boolean isEmpty() {
		return top == -1;
	}

	@Override
	public int size() {
		return top + 1;
	}

	@Override
	public boolean isFull() {
		return top == array.length - 1;
	}

}

Note 1: You can’t instantiate generic, typesafe arrays in Java, for historical and technical reasons that are beyond the scope of the course. Note the workaround: array = (E[]) new Object[capacity]; works, but generates a warning. As long as no references to the array escape the enclosing object (that is, as long as we never return the E[] array to a caller) we’ll be OK.

Note 2: When we pop, we explicitly set the array cell to null. Why? (on board) we are removing the reference (from the stack) to the object so that it can be garbage collected later. This is a minor potential memory leak but it’s worth plugging.

Linked-list based stack

Recall there are two basic ways to agglomerate data in Java: you can use arrays, or you can use references. For this second implementation, let’s consider using linked lists. Again, as we showed earlier in the semester, we’ll need a simple Node structure to link together into the list.

package linearadts;

// note this is a super-discount Node class; a "real" one would probably have at
// least a constructor (that took an `E data` argument)
public class Node<E> {
	public Node<E> next;
	public E data;
}

And since we’re only interested in the “top” of the stack, it’s pretty easy to do.

package linearadts;

public class UnboundedLinkedStack<E> implements Stack<E> {
	private Node<E> head;
	private int size;
	
	@Override
	public void push(E e) throws StackOverflowException {
		Node<E> node = new Node<>();
		node.data = e;
		node.next = head;
		head = node;
		size++;
	}

	@Override
	public E pop() throws StackUnderflowException {
		if (isEmpty()) {
			throw new StackUnderflowException();
		}
		E temp = head.data;
		head = head.next;
		size--;
		return temp;
	}

	@Override
	public E peek() throws StackUnderflowException {
		if (isEmpty()) {
			throw new StackUnderflowException();
		}
		return head.data;
	}

	@Override
	public boolean isEmpty() {
		return head == null;
	}

	@Override
	public boolean isFull() {
		return false;
	}

	@Override
	public int size() {		
		return size;
	}

}

Implementing queues using linked lists

Recall to implement a stack with linked lists, we used just a head pointer. This made sense, because we always manipulated the same end of the list.

To review, to add to the head of the list (like we just did for stacks), we did the following:

create a new node
set its next reference to the current head
set head to the new node

To remove the head node, we did the following:

remember the head node’s data
set head to it’s next reference (now there are no refs into the former head node, so it’s deleted)

Queues have to manipulate both ends of the list, so we’ll need a head and tail pointer. Should we enqueue at the head or the tail? And where should we dequeue?

We know that both adding and removing at the head are pretty straightforward, since that’s what we did with push/pop for stacks. What about adding or removing at the tail?

(On board). Suppose we keep a “tail” pointer which always points at the last node in the list. To add, we:

create a new node
set tail node’s next point to this new node
update tail to point at the new node

How do remove? We need a pointer to the node before the tail…we’d have to have both forward and backward pointers in the list (which then is a different data structure, a doubly-linked list), or to traverse the entire list to the node-before:

traverse to node-before, keep reference to it
remember the tail node’s data
set tail to the node-before

That’s linear-time behavior, which is terrible if we don’t need to do it.

So, what have we learned? Adding at the tail is trivial, but removing the tail in a singly-linked list is not (linear to seek to the node before the tail). So let’s enqueue onto the tail, and dequeue from the head.

What does the code look like? It’s similar to the stack, but with the tail reference come a few extra points of complication.

package linearadts;

public interface Queue<E> {
    void add(E e); 
    E remove(); // note: we would probably declare over/underflow in a "real" implementation
    E peek();
    boolean isEmpty();
    boolean isFull();
    int size();
}

public class LinkedQueue<E> {
	private Node<E> head;
	private Node<E> tail;
	private int size;
	
	public LinkedQueue() {
		// optional to initialized head and tail, as their default values are null
		// same for size = 0
	}
...

When we enqueue, we make a new node and add it to the rear of the queue. Again, how do we do this? By setting the tail’s next node equal to the new node, and then setting the tail equal to the new node.

But there’s a special case: if the queue is empty, our new node will be both the head and the tail. (on board)

	public void add(E e) {
		Node<E> node = new Node<>();
		node.data = e;
		
		if (isEmpty()) { // it's OK to use isEmpty here, because we're not yet changing the list.
			tail = node;		
			head = node;
		} else {
			tail.next = node;
			tail = node;		
		}
		size++;
	}

alternatively, a slightly shorter arrangement (though I think it’s maybe less clear):

	public void add(E e) {
		Node<E> node = new Node<>();
		node.data = e;
		
		if (isEmpty()) {
			head = node;
		} else {
			tail.next = node;
		}
		tail = node;		
		size++;
	}

Dequeuing is similar. We remove the head node from the list and return its value, just like popping from a stack.

But just like add(), there’s a special case is if the node is the last node. If so, we need to remember to also set tail to null. (on board)

public E remove() {
	E value = head.data;
	head = head.next;
	
	if (size == 1)  { // is it OK to use isEmpty() here? 
							// We just changed the list. 
							// (Answer: It depends on your implementation of isEmpty(). 
							// If it's just head == null, it's fine. 
							// But if it's size == 0, then maybe not;
							// depends when you size--!)
		tail = null;
	}
	size--;
	return value;
}

I’m leaving out the check for underflow (and I didn’t define a QueueUnderflowException) but I hope you see how to do that – it’s the same as for the stack.

The rest is pretty short:

@Override
public E peek() {
	return head.data;
}

@Override
public boolean isEmpty() {        
	return head == null; // also consider size == 0
}

@Override
public boolean isFull() {
	return false;
}

@Override
public int size() {        
	return size;
}

Implementing queues using arrays

You’d need to finish the above off with some other details to have a full and correct implementation (throwing over/underflow exceptions, for example) but it should be enough to convey the idea.

Now, suppose we want to use arrays to back our queue, like we did with stacks. With stacks, we fixed the element at index 0 as the “bottom” of the stack, and maintained a top pointer. What if we do this with queues?

(on board)

We can easily enqueue elements by keeping a pointer to the last element enqueued. But what about dequeuing? We’d have to remove the element at position zero, then move the element from position 1 into 0, 2 in to 1, and so on. That’s linear in the size of the queue (and thus terrible).

The right thing to do parallels the linked list implementation. Instead of a head and tail pointer, we’ll maintain two indices into the array, and allow both the front and rear of the queue to move through the array.

There is a trick, though: we need to now thing of the array as a circle, not as a linear array, that “wraps around”. This is kind of like a clock: (on board)

Now, instead of incrementing our front/rear indices, we use front = (front + 1) % capacity) – this is modular arithmetic, and “wraps around” exactly as we want.

We’ll define rear as the last occupied index (and initialize to what? virtual position “-1”). Thus the next place to enqueue is rear + 1. We’ll define front as the next occupied index to dequeue and initialize it to 0.

We have to keep track of the size, since both an empty queue and a full queue have rear just before front.

package linearadts;

public class CircularQueue<E> implements Queue<E> {
    E[] array;
    int front;
    int rear;
    int size;

    public CircularQueue(int capacity) {
        array = (E[]) new Object[capacity];
        size = 0;
        front = 0;
        rear = -1;
    }

    @Override
    public void add(E e) {
        // check for overflow!

        rear = (rear + 1) % array.length;
        array[rear] = e;
        size++;
    }

    @Override
    public E remove() {
        // check for underflow!

        E value = array[front];
        array[front] = null;
        front = (front + 1) % array.length;
        size--;
        return value;
    }

    @Override
    public E peek() {
        // check for underflow!
        return array[front];
    }

    @Override
    public boolean isEmpty() {
        return size == 0;
    }

    @Override
    public boolean isFull() {        
        return size == array.length;
    }

    @Override
    public int size() {        
        return size;
    }   
}

How you might implement priority queues using heaps

So finally, we’re going to do an implementation of priority queues.

It usually comes up in lecture that you could do a simple implementation by just using an array or list or the like, and re-sort()ing it after each call to add or remove – and that’s true! It would be functionally correct! The problem is related to performance. In particular, the comparison-based sorts we’ve seen so far are quadratic in the length of the list. That’s terrible performance to add or remove single elements! And even the best such sorts are log-linear, that is, (n log n), which while not quadratic is still more-than-linear, which means touching a single element still requires, in the worst case, more than a linear number of operations! Ugh.

So it turns out we can do better by using a particular kind of data structure – a heap. Now, just like the implementation details of stacks and queues, you’re unlikely to need to build heaps yourself in day-to-day applications programming (though, you will definitely appreciate this if you go to 187, and/or if you go onto other lower-level systems programming in the future).

A heap is, conceptually, another kind of binary tree, kinda like a binary search tree.

You’ll remember that BSTs are used to implement certain kinds of maps and sets, and are binary trees with a very specific property: each node’s N’s left-descendants (that is, their children, and their children’s children, etc.) contain a value less than N’s value, and right-descendants contain values greater than N’s value.

A BST is, then, a very organized sort of tree. A heap is, as you might guess from the name, a somewhat less organized sort of tree. Instead of obeying the BST property, it obeys a weaker property, called, not coincidentally, the heap property. The heap property is as follows: In a min-heap, each node N’s value is strictly less than it’s children’s values. That’s it. (In a max-heap, it’s strictly greater-than. And, you can relax this to be “less-than-or-equal” but I’m going to skip that as it makes the code slightly more complicated.)

The idea here is that, for a priority queue, you use a heap, and the “next” thing is always at the root. That’s it, that’s the idea.

Let’s show an example:

(draw a min-heap with the numbers 1-7 on the board, something like)

        1
    2       4
  6   3   7   5

Every node’s value is less than that of its children, so this is a heap.

Now, suppose we want to add a node to the heap. How might we do so? Well, we could try to guess the right spot and hope for the best. But the “right way” is to add it at the first open space at the “bottom” of the tree, here, as a child of 6. Let’s add the value 8. If we do, then great! It’s greater than 6 (or in other words, 6 is less than the new node 8), and we’re done. We know, because of the heap property, that nothing above six is greater than six.

Let’s add another node, zero, also a child of 6. Uh-oh! 0 is less than 6. So, the heap is currently broken – it doesn’t obey the “heap property.” So whenever we add a node, we have to check and “fix” the heap. How? By comparing with its parent, and if needed, swapping it – this is called “sifting up” (sometimes also called “bubbling up” or “trickling up”). If we don’t need to sift up, like when we added 8, then we’re done. If we do need to sift up, then, importantly, we need to “sift up” on the new parent node, too! We keep doing it until we no longer need to do it, or we reach the root.

So here, we’d sift 0 up to 6. Then we’d sift up to 2, then up to 1. The new tree would look like this:

        0
    1       4
  2   3   7   5
 8 6

So that’s how we add things, by appending to the bottom and then sifting up.

How do we remove things? We can’t just take the top node, because then it’s not clear what happens. Do we move an entire limb of the tree up? What if that breaks the heap property? It turns out, again, that there’s a right answer here. The right answer is to remove and return the top (root) element, but before you return, do a little bit more. What is that little bit more? You move the “bottom” element of the tree to the root, and then “sift down” the element into the right place.

Here, the comparison is slightly more complicated: you look at a node N’s value, and it’s two children’s values, C1 and C2. If node N is the smallest (in a min-heap), you’re done! If not, then you swap N with the smaller of C1 or C2 – so the smallest of the three is the new parent of this group of three. Then, you repeat the process with the child you swapped N down into – you sift down there again. Just like sifting up, you repeat until you can stop (because N is smallest) or must stop (because N has no children!).

Now, why is this better than re-sorting an entire list each time? Because you touch, at most, one node on each level of the tree – and the tree only has (log n) levels. In other words, it’s strictly sub-linear in the number of nodes in the tree! So this is way better than (n log n)!

How might you actually build this? I’m not going to spend the time on that in class – take 187 or read basically any data structures textbook – but I’ll give you the sketch here. The naive way to do it is with a Node-like structure, like linked lists. To do it, you need not just next references, but three in total: left and right children (for sifting down), and parents (for sifting up).

But in practice, this is often implemented using expandable arrays. In particular, you number the nodes of a tree like this:

    0
  1   2
 3 4 5 6

etc. And use those numbers as indices into an array. Then, parent is always (index - 1) / 2, left is always index * 2 + 1 and right is always index * 2 + 2. size is the index of the “last” element of the heap (just like the ArrayList). You can add by placing the next element at size + 1, just like arraylists, and you can expand, just like arraylists. In fact, this general construction always works for storing trees in arrays, and you’ll definitely see it again in 187.