05: Arrays and Lists

Announcements

ExSEL (http://www.umass.edu/lrc/exsel.html) in LRC-1085 (WEB DuBois Library, 10th floor) schedule:

  • Mon: 7–8:15
  • Tue: 8:30–9:45
  • Wed: 5:30–6:45
  • Thu: 7–8:15

If you need a makeup for the quiz, please contact us ASAP.

If you have a grading question or regrade request: Start with the TAs (either email or Piazza both with your question). If you are unsatisfied with their answer, then ask me, but the first thing I’m going to do is check with them to see what their reasoning was.

We’ll go over the quiz once all makeups are done.

Assignment difficulty, asking for help: This is a new course. It’s possible I will occasionally miscalibrate assignments to be too easy or too hard. I need you to trust me that I’m not going to fail the entire class if an assignment goes bad. But I also need you to meet me halfway and really try to do assignments, even if at first you have trouble. Please ask for help if you run into trouble, and please start early enough that we all (instructors and your fellow students) have an opportunity to see how things are going.

(Speaking of) starting assignments early: We got a several questions last night about Lab 3 Eclipse-related errors. These were the kind of errors that would show up in the first few minutes of starting the lab. There’s very limited time for us to help you if you wait until the night before an assignment is due to find out you’re having a weird problem.

Related: Please let us know if the solution we suggested worked. Or, if you figured it out yourself, tell us what you did! Either way, leave any public questions up and unmodified so everyone can see them. Piazza is a way for all students to benefit from one another’s questions and answers, please don’t be selfish.

Julia Evans: Her blog at http://jvns.ca/ is a fun read. She’s programmer at Stripe and a general evangelist for “being delighted about programming.” She’s a great speaker with a wonderfully approachable style and this shines through in her writing. You might find her blog interesting to read, especially a recent entry titled “Getting things done.”

Review: arrays

In our review of 121 material so far, we’ve exclusively used a single container type, the array. Container types are types that “hold” other types, and the array is probably the most basic: a fixed-size sequence of values (that are themselves either primitive types or references to objects) that can be read or written in approximately “constant time”. We call these values “cells” or “elements” in the array.

Constant time means that no matter how big the array is, (to a first approximation) it takes the same amount of time to access (read or write) an element. We expect each of these statements to execute in about the same amount of time:

array[0] = 5;
System.out.println(array[1]);

array[1000000] = 12;
System.out.println(123456789);

(modulo some caching effects, which are COMPSCI 230/335 material).

But arrays have some downsides, as well. For example, they’re fixed size: you need to know how many elements you want in advance. You can cheat here by allocating a giant array, but that’s wasteful for small inputs, and potentially won’t work anyway if your data is of size (giant array + 1).

Instead, we might use a higher-level abstraction; that is, a more general “container type” than an array. This week we’ll describe the List:

  • which is like (but not the same as) an array, and
  • which can be implemented in terms of an array.
  • we’ll do an array-based implementation of the List in lecture (though this is the only data structure where we’ll do this in detail)
  • we’ll briefly discuss but not implement a linked-list, an alternative to the array-based list, and talk about its pros and cons

We’ll show how our List compares to the Java API’s List, and this will lead into generics and container types, two topics that will come up again and again in material this semester.

Operations on arrays

To recap, what can you do with an array? You can declare it, allocate it, read or write individual elements, and determine its length at runtime.

Declare an array of Strings called strings.

Allocate a String array of size 50 and assign it to strings.

Set the zeroth element of strings to “Hi”.

Print the zeroth element of strings.

Print the length of strings.

String[] strings;
strings = new String[50];
strings[0] = "Hi";
System.out.println(strings[0]);
System.out.println(strings.length);

clicker question

That’s it for builtins of the array. If you want to do much else, you gotta build it yourself. (Note that there is a java.util.Arrays that has some helpful methods you can call on arrays, in particular the static Arrays.toString method is helpful when caveman debugging arrays of primitive types; otherwise the debugger can be helpful.)

So let’s talk about the List abstract data type.

Lists

Note I said “abstract data type.” First we’ll talk about the properties and assumptions we might expect from a List, in the abstract. Then we’ll do an actual, concrete implementation of the data type and see how it measures up.

“List” is a very overloaded term; we’ll simplify this by choosing a specific set of assumptions, that implicitly define an abstraction:

  • lists are unbounded, that is they don’t have a fixed size (if implemented with arrays, the arrays dynamically resize)
  • duplicate elements are allowed (when searching for an element, one of several equal elements is as good as any other)
  • lists can contain null elements (I hope you like NullPointerExceptions! though note some implementations might forbid null elements)
  • lists support an add operation, either to the end of the list or to a specific place (sometimes called an “insert”)
  • lists support a remove operation, either of a specific value or of an element at a particular index
  • lists support a get operation to return the element at a specific index
  • lists support a size operation to determine how many elements are currently in the list
  • lists can be in sorted order, but by default are not (you can imagine a SortedList that enforces this property)
  • and many more, but we’ll get to them later when we look at the full API that Java supplies.

A ListInterface

As much as possible, it’s a good idea to have the computer help us check our assumptions as we go. One way to do this is to leverage the type system. We’ll declare a StringListInterface for a list of Strings as described above, then we’ll implement it using arrays.

public interface StringListInterface {
    public void add(String s);
    public void add(int i, String s);
    public String remove(int i);
    public String get(int i);
    public int size();
}

What happens if a user of this interface does something “bad,” like attempting to remove an element that’s not present, or to add an element past the end of the list? The type doesn’t help us. We can add (optional) unchecked exception declarations to hint at this:

public interface StringListInterface {
    public void add(String s);
    public void add(int i, String s) throws IndexOutOfBoundsException;
    public String remove(int i) throws IndexOutOfBoundsException;
    public String get(int i) throws IndexOutOfBoundsException;
    public int size();
}

..though note that these just give a programmer using your StringListInterface a heads-up. The documentation comments for the class and the method define the actual contract that the class (or interface) offers. Note that the compiler can’t enforce all parts of this contract; it’s up to the programmer implementing the StringListInterface to do the right thing.

Also note that a full StringListInterface would have many more methods (set, analogous to array assignment, removeAll, equals, and so on). We’ll cover some of these in detail later once we start using the Java API, but I don’t want to write all of them in lecture! (Though it should be straightforward to write most of them.)

Writing the ArrayStringList

(If you want to implement an interface, use the Eclipse “new class” wizard to note you want to do so, and it will write skeletons of each method for you.)

Let’s think about what we need in our implementation. What instance variables do we need? Certainly an array of Strings to hold the list elements. Anything else? The number of elements actually stored in the array. Remember, one of the reasons we’re writing a List is that arrays are of fixed size, but a List can grow and shrink arbitrarily. (We’ll see how soon.)

So let’s declare an array String[] array and an int size to hold these values.

(On board) Conceptually, strings will be added to, gotten, and removed from this array; it’s the implementation’s job to make sure they go in the right place, and that if a user of ArrayStringList tries to access an element of the List (that is, of the underlying array) that is invalid, an exception is raised.

Let’s start with a constructor that starts with a small empty array:

public ArrayStringList() {
  strings = new String[10];
  size = 0;
}

Now let’s do the simple methods:

public int size() {
  return size;
}


@Override
public String get(int i) throws IndexOutOfBoundsException {
  return array[i];
}

But remember, while the array might be of size 10, there might be fewer than 10 (even no!) strings stored in the array. So a correct definition would instead read:

public String get(int i) throws IndexOutOfBoundsException {
  if (i >= size) {
    throw new IndexOutOfBoundsException();
  }

  return array[i];
}

This is important to understand: the List acts like a list of elements with a size equal to the number that have been added (less the number removed). Even though there’s an underlying array of a different size, the user of the List interface cannot see it! The details are said to be encapsulated. This is a very powerful concept that lets you use data structures (and generally any API) by reading their contract – you don’t need to fully understand every detail of the implementation (though it can be helpful to do so!).

Now let’s turn to some of the more complicated methods, like add:

public void add(String s) {
  array[size] = s;
  size++;
}

This sorta works, but what happens once we add the eleventh element? We’ll overflow the array bounds, which we don’t want to do – our list is supposed to be unbounded. Instead, we’ll check to see if the array needs to be expanded, and do so:

public void add(String s) {
  if (size == array.length) {
    enlarge();
  }
  array[size] = s;
  size++;
}

What should enlarge do? It should allocate a new, larger array, copy the current array into it, then set the strings instance variable to point to this new array.

void enlarge() {
  String[] larger = new String[array.length * 2];
  for (int i = 0; i < array.length; i++) {
    larger[i] = array[i];
  }
  array = larger;
}

Why double, and not, say, just + 10? The full answer is beyond the scope of this course, but in short: when you don’t know anything else, doubling is the most efficient way to dynamically grow an array. If you do know other things, you might expose ways to grow (or shrink) the underlying array, but that’s has its own problems (like: now users of your code are tied to your specific implementation, even if a better one comes along later).

What about if we want to add in a particular place, rather than just at the end of the array? We need to move each element out of the way. (On board) we have to move the last element forward one, then the previous element into the last element’s place, and so on, to “make space” for the item we’re inserting. We also need to make sure the index is valid, and that there’s space. In code:

public void add(int i, String s) throws IndexOutOfBoundsException {
  if (i >= size) {
    throw new IndexOutOfBoundsException();
  }
  if (size == array.length) {
    enlarge();
  }
  for (int j = size; j > i; j--) {
    array[j] = array[j-1];
  }
  array[i] = s;
  size++;
}

Finally, let’s write the code to remove an element at index i. Similar to the above, we’ll need to “move” any elements into the space we leave behind (on board). And by convention, return the value we removed.

public String remove(int i) throws IndexOutOfBoundsException {
  final String removed = strings[i];
  if (i >= size) {
    throw new IndexOutOfBoundsException();
  }

  for (int j = i; j < size; j++) {
    array[j] = array[j+1];
  }

  size--;
  return removed;
}

As I mentioned earlier, there are many other things you could do. For example, you could write removeAll method that completely empties the List.

In-class exercise

public void removeAll() {
  ...
}

At least two solutions are possible. One is to repeatedly call remove() until the list is empty; another is to directly manipulate the underlying instance variables.

public void removeAll1() {
  while (size() > 0) {
    remove(0);
  }
}

public void removeAll2() {
  size = 0;
  array = new String[10]; // this line is optional; do you see why?
}