Lecture 09: More on using lists; Comparators

Welcome

Announcements

Moodle grades appear to be displaying now. Maybe check ‘em and see how you’re doing in the course.

Fixing LinkedList

Last class we had an error in our code. Now we’ll fix it.

First, note that our add(s) method only ever appended the node to the head, but never to the end of the list when the list was non-empty. Ooops!

Second, our traversals to find the “node-before” in add(s, i) and remove(i) went one step too far; the loop condition is off by one.

Fix these two mistakes, and the implementation works.

Implications

Now we’ve seen two different ways to implement a simple abstract data type (the List), using arrays, and using references. All other abstract data types can be implemented in terms of one or both of these mechanisms.

We’ve also seen that both implementations have the same results. Though, if you think about it, one or the other is more efficient (or at least different) in some ways.

For example, the array-based list uses up to twice the memory of its current size(). The linked list uses only what it needs, plus space for the link. It actually turns out they’re about equivalent.

Array indexing (and assignment) are fast; about as fast as following a single link in a linked list. This means some things are going to be faster than others.

In-class exercise

Would you expect a call to get(someLargeIndex) in ArrayList or LinkedList to be faster?

For random access, the array-based implementation is better, since the get method is really a thin wrapper around array indexing, which is quite fast.

For addition or removal, it kind of depends. Adding something to the head of the linked list is really fast (make a node, set its next to head, set head to it), whereas adding a node to the front of an array-based list requires moving every single element in the array, and possibly enlarging the array, too.

On the other hand, adding an element to the end of the linked list is slow, since you have to traverse the entire list first. You could also keep a reference to the end of the list (called a tail pointer), but then you also have to update it and handle it in every method that modifies the list. On the third hand, only the implementor of the linked list needs to do this, not the user – as in our demo, both have the same interface, and neither expose their inner workings in terms of results (though runtime and memory usage might differ).

We won’t go into this comprehensive level of detail for (most) data structure implementation again in this course, though we will definitely revisit the idea of using an array or references to build more complicated data structures, like say this tree (on board). But that’s about as far as we’ll go, with diagrams rather than code. We’ll be more interested in the interface of the abstract data types and what behaviors they provide, rather than the fine details of the implementations.

Generics

Methods take parameters: rather than hardcoding all data and values into a method, we can make some parts of the data variable and parameterizable so that the method can be reused with different data and values. This make a lot of sense: Let’s say we want to write a method that adds five to its argument:

int add5(int i) {
	return i + 5;
}

That’s useful. But someday we want to also write a method that adds six:

int add6(int i) {
	return i + 6;
}

OK. Then add7, etc. Getting silly. We don’t want to write a different method each time, since since (1) there’s an infinite number of them! and (2) the operation of adding is mechanically the same each time. That is, there’s a generalized algorithm we write, once, and then can use many times.

int add(int i, int j) {
	return i + j;
}

The insight behind “generics” is that we can do the same thing with types – we can parameterize classes (and methods) with a type, too, and use it on different types of things. Our crappy StringListInterface, for example, while it lets us hold any list of Strings we want, was still limited to String data. But it turns out that instead of writing:

public interface StringListInterface {
	public void add(String s);
	public void add(String s, int i);
	public String remove(int i);
	public String get(int i);
	public int size();
}

you can parameterize the interface on a type using angle brackets:

public interface ListInterface<E> {
	public void add(E e);
	public void add(E e, int i);
	public E remove(int i);
	public E get(int i);
	public int size();
}

The ListInterface now sports a generic type name in angle brackets. We’ve defined a family of possible types here; note that each method that used to operate on strings now operates on this mysterious E.

We can also type parameterize a class:

public class Node<E> {
    private final E contents;
    private Node<E> next;
// more
}

…and the two together let us write generic code, that operates on generic types, based upon the type parameter.

Type parameters

The E is a type parameter – it says that the programmer who declares a variable of type ListInterface must also choose a particular type that the declared ListInterface will handle. ListInterfaces of different type parameters are of different types. For example, you cannot assign one to another unless they have the same type parameter, any more than you can assign a boolean to a String:

boolean x = "banana"; // not allowed, fails at compile time

List<String> x;
List<Integer> y;

... // some code ...

x = y; // not allowed, fails at compile time

In-class exercise

Will the following code compile (that is, are the types valid)? (x4)

More on parameters

Type parameters are usually written as a single uppercase letter, and often that letter is an abbreviation. E stands for Element of a collection; we’ll also see Key and Value later in the course.

Type parameters, when instantiated (that is, when a generic is declared), must be a non-primitive type. But, Java does something called auto-boxing, so you can generally mix primitives and non-primitives freely using the associated wrapper types, like Integer. (Integer and friends also have many useful static methods.)

The final fact for today about type parameters: Usually we think of them as being declared on classes (and indeed, that’s usually where they are declared). But if you write a particular method that would benefit from type parameterization, you can do so:

public class Util {
    public static <K, V> boolean compare(Pair<K, V> p1, Pair<K, V> p2) {
        return p1.getKey().equals(p2.getKey()) &&
               p1.getValue().equals(p2.getValue());
    }
}

Note the type parameters come immediately before the return type.

Why generics matter

They matter for the reasons listed above (generic re-usable code)! But also, in Java 5, the entire Collections library was re-written to use generics. Before then, all container types (List, etc.) only held things of type Object, and you, the programmer, had to laboriously cast them each time you used them:

List list = new ArrayList();
list.add("hello");
String s = (String) list.get(0);

Not only was this a pain, if you made a mistake:

		List l = new ArrayList();
		l.add("Zero");
		
		String s = (String)l.get(0);
		
		//...
		
		Integer i = (Integer)l.get(0); // throws exception at run-time!
		System.out.print(i);

you’d find out at run-time, not at compile-time. And while I know you hate compiler errors right now, you’ll learn to love them when writing big programs — every error that the compiler catches is one you can fix at your leisure, while run-time errors are erratic, not always reproducible, and generally result in a (much bigger) headache for you.

Lists and sorting

Suppose you want to add items to a list (say, a storeNumbers attribute of type List<Integer> a store management system) and you want to keep it sorted. How would you write the public void addStore(int newNumber) method?

public void addStore(int newNumber) {
	if (storeNumbers.isEmpty()) {
		storeNumbers.add(newNumber);
		return;
	}
	int i = 0;
	for (Integer storeNumber: storeNumbers) {
		if (storeNumber.compareTo(newNumber) >= 0) {
			storeNumbers.add(i, newNumber);
			return;
		}
		i++;
	}
	storeNumbers.add(newNumber);
}

This could throw a ConcurrentModificationException.

WTF? It turns out that some (most) implementations of collections are very particular about allowing you to modify them while you are iterating. Creating an iterator, then modifying the collection, then trying to iterate is generally not allowed. See, for example, the ArrayList docs: http://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html and note that “…if the list is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove or add methods, the iterator will throw a ConcurrentModificationException.”

The exception will only be thrown if the iterator (the top of the for loop) is reached again after the list is modified.

So instead we could work with indices directly:

public void addStore(int newNumber) {
	if (storeNumbers.isEmpty()) {
		storeNumbers.add(newNumber);
		return;
	}
	for (int i = 0; i < storeNumbers.size(); i++) {
		if (storeNumbers.get(i).compareTo(newNumber) >= 0) {
			storeNumbers.add(i, newNumber);
			return;
		}
	}
	storeNumbers.add(newNumber);
}

In-class exercise

storeNumbers.sort(null);

What is sort(null) doing?

What’s up with the sort(null) call? Look at the API: http://docs.oracle.com/javase/8/docs/api/java/util/List.html#sort-java.util.Comparator-

The Comparator type parameter is an interface that describes how to compare two arbitrary objects (see also its simpler predecessor, the Comparable interface). You might implement it if you wanted to do something odd with the sort, for example, put all odd numbers before all even numbers (which maybe sounds nonsensical, but think about mail delivery up and down each side of the street).

But you don’t actually have to implement its abstract methods to use it here. Reading the documentation:

If the specified comparator is null then all elements in this list must implement the Comparable interface and the elements’ natural ordering should be used.

Do Integers implement Comparable? Let’s check: http://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html

Yup. In general, you should start doing your best to read and follow links in the Java API if you see things in code you don’t understand. You might not understand everything you read, but only by trying are you going to learn, and at some point you can’t expect your instructors to spoon-feed you everything (though I will definitely cover the highlights).

So anyway, we can exercise option 2:

Option 2: Append the number then sort the list!

public void addStore(int newNumber) {
	storeNumbers.add(newNumber);
	storeNumbers.sort(null);
}

Another example

So we’ve been talking about Lists and generics and reviewing 121 stuff like boolean conditions and flow control. Let’s do another worked example.

We’re going to design and write a class to represent simplified postal addresses (street numbers and names). Then we’re going to dump several instances of it into a list. Finally, we’re going to define a custom Comparator on a postal address that will let us sort the list in a special way.

Let’s get going.

Given the problem statement, we know we’re going to want to define a class that defines objects containing a street number and name:

public class PostalAddress {
	public final int number;
	public final String streetName;
}

What is up with Marc and his use of public final? Here’s what’s up: when you know a data type is going to remain fixed, and that any particular object isn’t going to have its data change, there’s no reason to deal with the pain of writing private instance variables and then turning around to write public accessors (get methods). “But Marc, what if you change how the value is stored, or how it is determined?” Then you’ll be giving a different name (or writing an accessor, etc.) and your IDE will flag all occurrences for you to fix. Or better yet, will allow you to refactor them all yourself.

(See “Effective Java, 2nd edition” which though about a decade old is still one of the best bits of reading you can do once you’re an intermediate Java programmer.)

Anyway, now let’s add a few constructors. First the obvious one:

public PostalAddress(int number, String streetName) {
	this.number = number;
	this.streetName = streetName;
}

Now one that does some simple parsing:

public PostalAddress(String textAddress) {
	String[] matches = textAddress.split("\\s+", 2);
	this.number = Integer.parseInt(matches[0]);
	this.streetName = matches[1];
}

OK! Now we’ve got a simple class. Let’s try making a few of them and adding them to a list:

List<PostalAddress> addresses = new ArrayList<PostalAddress>();

for (int i = 1; i <= 10; i++) {
	addresses.add(new PostalAddress(i, "Maple St"));
}

and maybe printing them out:

System.out.println(addresses);

Ugh, what’s this PostalAddress@677327b6 nonsense? Remember, when you print an object, Java tries to coerce it to a String using its toString method. We haven’t written one, so we get the Object default, which is what you see. It is derived from the class name and the hashCode() method. Since we haven’t defined the latter, we get its default, which is usually but not always the object’s memory address. Yuck. Let’s make it better:

public String toString() {
	return streetName + ", " + number;
}

Now it’s a little better. Hey, let’s see if 6 Maple Street is in our list:

System.out.println(addresses.contains(new PostalAddress("6 Maple St")));

In-class exercise

What will this print?

false? What? Maybe our constructor is broken, let’s try the other one:

System.out.println(addresses.contains(new PostalAddress(6, "Maple St")));

Nope, still false. Why? Let’s look at the List.contains method javadoc. Effing trinary. What does (o==null ? e==null : o.equals(e)) mean? This is a very terse way to express an if/else statement that returns a value.

Break it on the ? and the :. If the thing before the ? evaluates to true, return the thing between the ? and :; else return the thing after the :. Here, it means return true if both things are null; if not, only if equals returns true. Ahh, but we haven’t written an equals, which means we’re using the default one from Object. …which checks hashCode, which will be different for different instances of the object. (Look it up in the java doc) Let’s fix this problem?

public boolean equals(PostalAddress o) {
	return number == o.number && streetName.equals(o.streetName);
}

But it still doesn’t work? Oh man, you’re gonna love this. Note that contains (and equals) work on Objects, not PostalAddresses. So we need to change the signature. And then check the type:

public boolean equals(Object o) {
	if (!(o instanceof PostalAddress)) return false;
	PostalAddress p = (PostalAddress)o;
	return number == p.number && streetName.equals(p.streetName);
}

OK, better. But I will note we are omitting some important-in-practice details, as we’re ignoring null and we’re violating the contract for hashCode:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

We’ll come back to hashCode in more detail when we get to Sets later, but for now we’ll do what Java programmers always do Use Eclipse (or your IDE of choice)’s code generator to do it. (Demo.)

OK, our last task. Let’s say we want to be able to sort our address list. We use the sort function on the List, which if passed null uses the natural order, or if passed a Comparator uses it to sort the list. But what’s the natural order of a PostalAddress? It’s only defined if the PostalAddress implements Comparable. So looks like we’re doing one or the other.

Generally, you want to define Comparable if you expect values of the data type to be compared and you want there to be a canonical way to compare them. You usually define custom Comparators for things like the custom sort. Let’s do both. First, let’s impose a natural order on PostalAddress so that they sort first by street name, then by number. Add implements Comparable<PostalAddress> to the class signature, and Eclipse can helpfully add the missing method:

@Override
public int compareTo(PostalAddress o) {
	// TODO Auto-generated method stub
	return 0;
}

Well that won’t do.

What to do

Implement the method. (JavaDoc for compareTo on projector.) See: http://docs.oracle.com/javase/8/docs/api/java/lang/Comparable.html

The memory aid is that if you want x < y, then you need to write compareTo such that x.compareTo(y) < 0.

Similarly, for x > y, then you need to write compareTo such that x.compareTo(y) > 0.

In-class exercise

@Override
public int compareTo(PostalAddress o) {
	if (streetName.compareTo(o.streetName) > 0) return -1;
	if (streetName.compareTo(o.streetName) < 0) return 1;
	if (number > o.number) return -1;
	if (number < o.number) return 1;
	return 0;		
}

or slightly more concisely:

@Override
public int compareTo(PostalAddress o) {
	if (streetName.compareTo(o.streetName) < 0) return -1;
	if (streetName.compareTo(o.streetName) > 0) return 1;
	return Integer.compare(number, o.number);
}

Remember, you can look up Integer.compare in the Java API (or just Google it).

Let’s create the list out of order, print it, sort it, then print it:

for (int i = 10; i >=1; i -= 2) {
	addresses.add(new PostalAddress(i, "Maple St"));
}		
for (int i = 1; i < 10; i += 2) {
	addresses.add(new PostalAddress(i, "Birch St"));
}
System.out.println(addresses);
addresses.sort(null);
System.out.println(addresses);

Hey, it works!

Now let’s define a custom comparator for use in doing a “postal sort”. That is, we still want to sort such that street names are alphabetical, but we want the numbers sorted as all odd first (in ascending order), then all even (in descending order). This is how the truck might go up and down the street (on board).

What does that look like? Let’s declare a new Comparator:

public class PostalOrderComparator implements Comparator<PostalAddress> { ... }

Again, Eclipse helpfully fills it out with the method we need to implement, so let’s do it.

It will be similar to but more complicated than the compareTo method we just wrote. A tip: x % 2 == 0 if and only if x is even. x % 2 == 1 iff it’s false.

public int compare(PostalAddress o1, PostalAddress o2) {
	if (o1.streetName.compareTo(o2.streetName) < 0) return -1;
	if (o1.streetName.compareTo(o2.streetName) > 0) return 1;
	if (o1.number % 2 == 1 && o2.number % 2 == 0) return -1;
	if (o1.number % 2 == 0 && o2.number % 2 == 1) return 1;
	if (o1.number % 2 == 1) return Integer.compare(o1.number, o2.number);
	if (o1.number % 2 == 0) return Integer.compare(o2.number, o1.number);
	return 0;
}

And let’s check it out:

for (int i = 6; i >=1; i -= 2) {
	addresses.add(new PostalAddress(i, "Maple St"));
}		
for (int i = 1; i < 6; i += 2) {
	addresses.add(new PostalAddress(i, "Birch St"));
}
for (int i = 6; i >=1; i -= 2) {
	addresses.add(new PostalAddress(i, "Birch St"));
}		
for (int i = 1; i < 6; i += 2) {
	addresses.add(new PostalAddress(i, "Maple St"));
}
System.out.println(addresses);
addresses.sort(null);
System.out.println(addresses);		
addresses.sort(new PostalOrderComparator());
System.out.println(addresses);

Things we might do to improve this? Add an isOdd and/or isEven method for readability, perhaps? Pull out o1.number and o2.number into local variables? Both are debatable. Here’s what we ended up with in class:

	public int compare(PostalAddress o1, PostalAddress o2) {
		if (o1.name.compareTo(o2.name) != 0)
			return o1.name.compareTo(o2.name); // sort by street name first
		// then break ties on street name
		if (o1.number % 2 == 1 && o2.number % 2 == 0) return -1; // if o1 is odd, it comes first
		if (o2.number % 2 == 1 && o1.number % 2 == 0) return 1;  // if o2 is odd, it comes first
		// then break ties again, on number
		if (o1.number % 2 == 1) return Integer.compare(o1.number, o2.number);
		return -Integer.compare(o1.number, o2.number);
	}