Week 04: The List abstract data type

Welcome

Announcements

Check your grades in Moodle please! That’s the authoritative source I will be using to determine your grade in this class.

Questions from the last two weeks?

A reminder to Marc to ask you in lecture: Do you have questions about what we covered online? Here’s your chance to ask them interactively.

Reminder on equals

As you saw last week, writing a “proper” equals method is kind of a pain in the ass. First you have to check if the other object is non-null. Optionally you check if it’s == to the current object. Then you check if it’s of the same type as the current object. Then you compare each relevant field of the other object to the current one, using == or .equals as appropriate.

A04 requires that your code be able to compare fragments for semantic equality. You, the programmer, must decide if some “other” object is equal to the current Fragment – the “hint” here is that you must do exactly as above, and compare the relevant instance variable. There’s only really going to be one – a String – so use .equals on that one. Ask for help on Campuswire or in office hour if it’s not clear how to do it.

Review: the StringList

So last week we started talking about about this interface:

public interface StringList {
	public void add(String s);
	public void add(int i, String s) throws IndexOutOfBoundsException;
	public String remove(int i) throws IndexOutOfBoundsException;
	public String get(int i) throws IndexOutOfBoundsException;
	public int size();
}

Today, let’s finish it, and then we’ll move onto an alternative.

Writing the StringArrayList

Recall we were talking about enlarge. It should allocate a new, larger array, copy the current array into it, then set the strings instance variable to point to this new array.

void enlarge() {
  String[] larger = new String[array.length * 2];
  for (int i = 0; i < array.length; i++) {
    larger[i] = array[i];
  }
  array = larger;
}

Why double, and not, say, just + 10? The full answer is beyond the scope of this course, but in short: when you don’t know anything else, doubling is the most efficient way to dynamically grow an array. If you do know other things, you might expose ways to grow (or shrink) the underlying array, but that’s has its own problems (like: now users of your code are tied to your specific implementation, even if a better one comes along later).

What about if we want to add in a particular place, rather than just at the end of the array? We need to move each element out of the way.

(On board) we have to move the last element forward one, then the previous element into the last element’s place, and so on, to “make space” for the item we’re inserting. We also need to make sure the index is valid, and that there’s space.

In code:

public void add(int i, String s) throws IndexOutOfBoundsException {
  if (i >= size || i < 0) {
    throw new IndexOutOfBoundsException();
  }
  
  if (size == array.length) {
    enlarge();
  }
  for (int j = size; j > i; j--) {
    array[j] = array[j-1];
  }
  array[i] = s;
  size++;
}

Notice that we never worry about “checking” what’s already in an array cell. We know that all cells at indexes less than size are in use; and all cells at indexes greater than or equal to size are not.

Finally, let’s write the code to remove an element at index i. Similar to the above, we’ll need to “move” any elements into the space we leave behind (on board). And by convention, return the value we removed.

public String remove(int i) throws IndexOutOfBoundsException {
  final String removed = array[i];
  if (i >= size || i < 0) {
    throw new IndexOutOfBoundsException();
  }

  for (int j = i; j < size - 1 ; j++) {
    array[j] = array[j+1];
  }

  // optional
  array[size-1] = null;

  size--;
  return removed;
}

Setting the unused space to null lets the garbage collector free the memory, but that reference will also be overwritten the next time we add to the list, so it’s not strictly necessary.

Testing the implementation

Let’s write a short, silly program demonstrating how our current code works. Let’s write a method to print out the contents of a StringList:

public static void print(StringList l) {
  for (int i = 0; i < l.size(); i++) {
	  System.out.print(l.get(i) + " ");
  }
  System.out.println();
}

Notice this is in terms of StringList: it knows nothing about the ArrayStringList and only depends upon methods in the interface.

Next, a silly test method:

public static void main(String[] args) {
	StringList l = new StringArrayList();

	l.add("Marc");
	l.add("not");
	l.add("so-good.");
	print(l);

	l.add(1, "is");
	print(l);

	l.remove(2);
	print(l);
}

OK, so our StringList works. What about if we build it a different way?

Next – Linked lists!

Next, we’re going to spend a little more time on an alternate implementation for StringList. This will be especially useful for those of you intending to go on to 187, but it’s important regardless, since in some sense seeing-is-believing that you can have two very different ways of doing something that obey the same contract (but that might differ in, say, time or space efficiency). And it will give some context to the box-and-arrow diagrams that we’ll be using later in the class to explain why various data structures work the way they do.

Lions, tigers, bears, and linked lists

Remember, we built our ArrayStringList using an array as our underlying container. We hid this detail from users of the class, so it’s entirely possible we could have used a different way to build the list. Java’s builtins (that is, language constructs not library classes or methods) that let you “connect” or collect multiple variables are limited to arrays and references, but it turns out — not coincidentally — that these are all you need to build basically every other data structure. We (definitely!) won’t do them all, but we will do one more today: the linked-list. You’ll also benefit from this when we talk about how some other data structures are built later (though again, not in this level of detail, except for very simple structures).

Arrays are a fixed-size container of a sequence of cells; linked lists circumvent this restriction by making the cells (or “nodes” as they’re called in linked lists) explicit objects rather than an implicit property of an array. Here’s a simple container for Strings:

public class StringNode {
    private final String contents;

    public StringNode(String contents) {
			this.contents = contents;
    }

    public String getContents() {
			return contents;
    }
}

(On board). A simple object to hold a String and a method to get it. So far so good, except there’s no concept here of more than one StringNode being part of the same group of StringNodes. What do we do to fix this? Add a reference to a next StringNode to the current one!

public class StringNode {
    private final String contents;
    private StringNode next;

    public StringNode(String contents) {
			this.contents = contents;
			next = null;
    }

    public String getContents() {
			return contents;
    }

    public StringNode getNext() {
			return next;
    }

    public void setNext(StringNode n) {
			next = n;
    }
}

(Again on board) you can see if we want to add a node to the end of the list, it’s pretty straightforward. Inserting into the middle (or removing) is a little more complicated, but we’ll get there in a bit.

Let’s use this StringNode to build an alternate implementation of StringList.

Building StringLinkedList

We’ll need to keep a reference to the underlying list of nodes as an instance variable, and it turns out that’s the only instance variable we need. By convention, this is called head, and it’s initialized to null (an empty list) when we start:

private StringNode head;

public StringLinkedList() {
	head = null;
}

Let’s add the “simple” methods from last lecture, starting with size. What do we need to do? There’s no underlying array, so we can’t just examine the length attribute and return it. Instead, we need to traverse the list. In other words, (on board) we will start at the head element (if it exists), and follow the next references until we reach the end, signaled by a null value. Don’t forget that head is null, so your code should account for this:

public int size() {
	int size = 0;
	StringNode current = head;
	while (current != null) {
			size++;
			current = current.getNext();
	}
	return size;
}

Unlike arrays, where we can just jump to the node we want, a linked-list (almost) always requires that we traverse its elements until we get to the one we want.

Note that we don’t have to traverse the whole list for size; we could maintain a private size instance variable and just return it instead if we wanted to (and then, just like in StringArrayList, we need to remember to updated whenever adding or removing.) Let’s do so instead of the above:

	int size;
	StringNode head;
	
	public StringLinkedList() {
		size = 0;
		head = null;
	}
	
	@Override
	public int size() {
		return size;
	}

Now let’s do get. The exceptions are the same as before, but now we must traverse the list to find the ith element:

public String get(int i) throws IndexOutOfBoundsException {
	if (i < 0 || i >= size) {
		throw new IndexOutOfBoundsException();
	}
	int j = 0;
	StringNode current = head;
	while (true) {
		if (i == j) {
			return current.getContents();
		}
		current = current.getNext();
		j++;
	}
}

More on StringLinkedList

Now let’s look at add. If we want to add at the end of the list we can again traverse it to get to the end, and adjust the last element’s next reference appropriately. Note that we have to stop our traversal just before we get to the end, not after we go past it, which also means we have to special-case the head (on board first):

public void add(String s) {
	size++;
	StringNode n = new StringNode(s);

	// Case 1: empty list
	if (head == null) {
		head = n;
		return;
	}

	// Case 2: non-empty list
	StringNode current = head;
	while (current.getNext() != null) {
		current = current.getNext();
	}
	// now reached the node at the end of the list
	current.setNext(n);
}

Similarly, if we want to add somewhere in the middle of the list, we have to stop just before we get there, and do surgery on both the previous and current (added) item to make the links in the list line up, again with a special case for the first spot:

public void add(int i, String s) throws IndexOutOfBoundsException {
	if (i < 0 || i > size) {
		throw new IndexOutOfBoundsException();
	}

	size++;
	StringNode n = new StringNode(s);

	// Case 1: insert at head of list, position 0
	if (i == 0) {
		// Step 1, set n's next pointer to head
		n.setNext(head);
		// Step 2: set head to n
		head = n;
		return;
	}

	// Case 2: inserting elsewhere in the list

	// Step 1: find the node before
	StringNode nodeBefore = head;
	for (int j = 1; j < i; j++) { // j = 0; j < i - 1
		nodeBefore = nodeBefore.getNext();
	}
	
	// Step 2: update n's next pointer to the nodeBefore's next pointer
	n.setNext(nodeBefore.getNext());

	// Step 3: set nodeBefore's next pointer to n
	nodeBefore.setNext(n);
}

Notice our bounds check deliberately permits insertion one step “after” the list (i > size not i >= size).

Finally the remove method, which again has to do some surgery on the previous node (if it exists). Let’s do some examples on the board first (end of list; front of list; middle of list)

Then here’s the code for remove():

public String remove(int i) throws IndexOutOfBoundsException {
	if (i < 0 || i >= size) {
		throw new IndexOutOfBoundsException();
	}

	size--;
	final String result;

	if (i == 0) {
		result = head.getContents();
		head = head.getNext();
		return result;
	}

	StringNode nodeBefore = head;
	for (int j = 1; j < i; j++) {
		nodeBefore = nodeBefore.getNext();
	}
	StringNode nodeToDelete = nodeBefore.getNext();
	result = nodeToDelete.getContents();
	nodeBefore.setNext(nodeToDelete.getNext());

	return result;
}

And we’ll run it in our toy program, switching StringArrayList to StringLinkedList. Notice the behavior doesn’t change. (StringList sli = new StringLinkedList(); is the only change.)

Implications

Now we’ve seen two different ways to implement a simple abstract data type (the List), using arrays, and using references. All other abstract data types can be implemented in terms of one or both of these mechanisms.

We’ve also seen that both implementations have the same results. Though, if you think about it, one or the other is more efficient (or at least different) in some ways.

For example, the array-based list uses up to twice the memory of its current size(). The linked list uses only what it needs, plus space for the link. It actually turns out they’re about equivalent.

Array indexing (and assignment) are fast; about as fast as following a single link in a linked list. This means some things are going to be faster than others.

For example, Would you expect a call to get(someLargeIndex) in ArrayList or LinkedList to be faster?

For random access, the array-based implementation is better, since the get method is really a thin wrapper around array indexing, which is quite fast.

For addition or removal, it kind of depends. Adding something to the head of the linked list is really fast (make a node, set its next to head, set head to it), whereas adding a node to the front of an array-based list requires moving every single element in the array, and possibly enlarging the array, too.

On the other hand, adding an element to the end of the linked list is slow, since you have to traverse the entire list first. You could also keep a reference to the end of the list (called a tail pointer), but then you also have to update it and handle it in every method that modifies the list. On the third hand, only the implementor of the linked list needs to do this, not the user – as in our demo, both have the same interface, and neither expose their inner workings in terms of results (though runtime and memory usage might differ).

We won’t go into this comprehensive level of detail for (most) data structure implementation again in this course, though we will definitely revisit the idea of using an array or references to build more complicated data structures. But that’s about as far as we’ll go, with diagrams rather than code. We’ll be more interested in the interface of the abstract data types and what behaviors they provide, rather than the fine details of the implementations – that comes in 187 and later classes.