Week 05: Generics and comparators; introduction to sets

Aliasing and references in Lists (and other container types)

You probably noticed that this week’s assignment forced you to make a copy of a list that’s passed into a constructor. Some of you did it manually:

public class Assembler {
	private final List<Fragment> fragments;

	public Assembler(List<Fragment> fragments) {
		this.fragments = new ArrayList<Fragment>();
		for (Fragment f: fragments) {
			this.fragments.add(f);
		}
	}

But you can also do it in one line, using the ArrayList copy constructor, which copies one collection into another when that collection is passed as an argument:

		this.fragments = new ArrayList<Fragment>(fragments);

But it does leave the question: Why do I make you do this? Because it’s a good idea! But why?

The real answer here is that if you don’t, then the list you’ve “passed into” the Assembler is an alias of the list stored in the Assembler’s instance variable. So changing one changes the other, even though it “looks like” they are different lists!

This can be a huge source of bugs for programmers who haven’t internalized that different references can refer to the same thing. When this bug bites, it’s not usually when you use two references in the same scope, like we did in the Bus question on the problem set a week or so ago. It’s because you made a copy of a reference and used both as though they were separate, independent things. But they’re not! They’re the same object! And that leads to problems if you don’t realize they are the same object with different names in different scopes.

Immutability

Astute students may note that even if you copy the lists, the objects stored in the lists are still aliases of one another. This is true! And something you need to be careful of! But you’ll notice that some objects are basically “immutable” – you can’t change their state once they’re created.

For example, instances of String can’t be changed once you make them. “But Marc, yes you can! You can call, for example, someString.toUpperCase() and it becomes an upper-case string!

Not exactly, my fine friend. toUpperCase returns a new string but leaves the original unchanged. There’s no way to modify the original String object. You can change what the reference refers to, for example, someString = "A different string";, but that’s not changing a String object, it’s changing the reference stored in a String variable.

You’ll notice that Fragment in A04 was similar – once it’s constructed, no method in its API should change its structure. mergedWith is like toUpperCase – it returns a new Fragment but doesn’t change the current one.

Generally, if you can make objects immutable in this way, you should. It prevents an enormous set of hard-to-track-down errors.

The other option is to perform a “deep copy,” copying not just the List’s references to items, but all the items, as well. This is typically unnecessary, and sometimes a “code smell” indicating your program’s design is unnecessarily complex.

Generics

Motivation

Methods take parameters: rather than hardcoding all data and values into a method, we can make some parts of the data variable and parameterizable so that the method can be reused with different data and values. This make a lot of sense: Let’s say we want to write a method that adds five to its argument:

int add5(int i) {
	return i + 5;
}

That’s useful. But someday we want to also write a method that adds six:

int add6(int i) {
	return i + 6;
}

OK. Then add7, etc. Getting silly. We don’t want to write a different method each time, since since (1) there’s an infinite number of them! and (2) the operation of adding is mechanically the same each time. That is, there’s a generalized algorithm we write, once, and then can use many times.

int add(int i, int j) {
	return i + j;
}

Example

The insight behind “generics” is that we can do the same thing with types – we can parameterize classes (and methods) with a type, too, and use it on different types of things. Our crappy StringList, for example, while it lets us hold any list of Strings we want, was still limited to String data. But it turns out that instead of writing:

public interface StringList {
	public void add(String s);
	public void add(String s, int i);
	public String remove(int i);
	public String get(int i);
	public int size();
}

you can parameterize the interface on a type using angle brackets:

public interface List<E> {
	public void add(E e);
	public void add(E e, int i);
	public E remove(int i);
	public E get(int i);
	public int size();
}

The List now sports a generic type name in angle brackets. We’ve defined a family of possible types here; note that each method that used to operate on strings now operates on this mysterious E.

We can also type parameterize a class:

public class Node<E> {
    private final E contents;
    private Node<E> next;
// more
}

…and the two together let us write generic code, that operates on generic types, based upon the type parameter.

Here’s the fully converted Node and LinkedList:

public class Node<E> {
    private E contents;
    private Node<E> next;

    public Node(E contents) {
        this.contents = contents;        
    }
    
    public E getContents() {
        return contents;
    }

    public Node<E> getNext() {
        return next;
    }

    public void setNext(Node<E> n) {
        next = n;
    }
}

public class LinkedList<E> implements List<E> {
    private Node<E> head;
    private int size;

    public LinkedList() {
        head = null;
        size = 0;
    }

        @Override
        public void add(E s) {
        size++;
        Node<E> n = new Node<E>(s);

        // Case 1: empty list
        if (head == null) {
            head = n;
            return;
        }

        // Case 2: non-empty list
        Node<E> current = head;
        while (current.getNext() != null) {
            current = current.getNext();
        }
        // now reached the node at the end of the list
        current.setNext(n);
        }

        @Override
        public void add(int i, E s) throws IndexOutOfBoundsException {
        if (i < 0 || i > size) {
            throw new IndexOutOfBoundsException();
        }

        size++;
        Node<E> n = new Node<E>(s);

        // Case 1: insert at head of list, position 0
        if (i == 0) {
            // Step 1, set n's next pointer to head
            n.setNext(head);
            // Step 2: set head to n
            head = n;
            return;
        }

        // Case 2: inserting elsewhere in the list

        // Step 1: find the node before
        Node<E> nodeBefore = head;
        for (int j = 1; j < i; j++) { // j = 0; j < i - 1
            nodeBefore = nodeBefore.getNext();
        }
        
        // Step 2: update n's next pointer to the nodeBefore's next pointer
        n.setNext(nodeBefore.getNext());

        // Step 3: set nodeBefore's next pointer to n
        nodeBefore.setNext(n);
        }

        @Override
        public E remove(int i) throws IndexOutOfBoundsException {
        if (i < 0 || i >= size) {
            throw new IndexOutOfBoundsException();
        }

        size--;
        final E result;
        // Case 1: remove first node in the list
        if (i == 0) {
            result = head.getContents();
            head = head.getNext();
            return result;
        }

        // Case 2: remove other node from list

        // Step 1: find the node before the node we want to remove
        Node<E> nodeBefore = head;
        for (int j = 1 ; j < i; j++) {
            nodeBefore = nodeBefore.getNext();
        }

        final Node<E> nodeToDelete = nodeBefore.getNext();
        result = nodeToDelete.getContents();

        // Step 2: set nodeBefore's next pointer to the node after
        // the node we are deleting
        nodeBefore.setNext(nodeToDelete.getNext());

        return result;
        }

        @Override
        public E get(int i) throws IndexOutOfBoundsException {
        if (i < 0 || i >= size) {
            throw new IndexOutOfBoundsException();
        }
        int j = 0;
        Node<E> current = head;
        while (true) {
            if (i == j) {
                return current.getContents();
            }
            current = current.getNext();
            j++;
        }
        }

        @Override
        public int size() {
                return size;
        }

}

Type parameters

The E is a type parameter – it says that the programmer who declares a variable of type ListInterface must also choose a particular type that the declared ListInterface will handle. ListInterfaces of different type parameters are of different types. For example, you cannot assign one to another unless they have the same type parameter, any more than you can assign a boolean to a String:

boolean x = "banana"; // not allowed, fails at compile time

List<String> x;
List<Integer> y;

... // some code ...

x = y; // not allowed, fails at compile time

More on parameters

Type parameters are usually written as a single uppercase letter, and often that letter is an abbreviation. E stands for Element of a collection; we’ll also see Key and Value later in the course.

Type parameters, when instantiated (that is, when a generic is declared), must be a non-primitive type. But, Java does something called auto-boxing, so you can generally mix primitives and non-primitives freely using the associated wrapper types, like Integer. (Integer and friends also have many useful static methods.)

Another fun fact for today about type parameters: Usually we think of them as being declared on classes (and indeed, that’s usually where they are declared). But if you write a particular method that would benefit from type parameterization, you can do so:

public class Pair<K, V> {
    public final K k;
    public final V v;

    public Pair(K k, V v) {
        this.k = k;
        this.v = v;
    }
}

public class Util<K, V> {

    public static <K, V> boolean compare(Pair<K, V> p1, Pair<K, V> p2) {
        return (p1.k.equals(p2.k) && p1.v.equals(p2.v));
    }

}

Note the type parameters come immediately before the return type. This is important, as static methods that operate on arguments (or have a return type) that’s generic come up all the time, and you need to be able to tell Java what the generic type(s) in the method are. They’re independent of the generic types, if any, that the class declares. For example, if the above class definition included public class Util<K>, the K in the compare method is not necessarily the same as the K in the method’s declaration.

Why generics matter

They matter for the reasons listed above (generic re-usable code)! But also, in Java 5, the entire Collections library was re-written to use generics. Before then, all container types (List, etc.) only held things of type Object, and you, the programmer, had to laboriously cast them each time you used them:

List list = new ArrayList();
list.add("hello");
String s = (String) list.get(0);

Not only was this a pain, if you made a mistake:

		List l = new ArrayList();
		l.add("Zero");
		
		String s = (String)l.get(0);
		
		//...
		
		Integer i = (Integer)l.get(0); // throws exception at run-time!
		System.out.print(i);

you’d find out at run-time, not at compile-time. And while I know you hate compiler errors right now, you’ll learn to love them when writing big programs — every error that the compiler catches is one you can fix at your leisure, while run-time errors are erratic, not always reproducible, and generally result in a (much bigger) headache for you.

Lists and sorting

Suppose you want to add items to a list (say, a stores attribute of type List<Integer> a store management system) and you want to keep it sorted. An aside: “sorted” is a funny word here. We really mean “in ascending (or descending) order”, but it’s a historical bit of vocabulary to say “sorted” that’s stuck around.

How would you write the public void addStore(int store) method?

public void addStore(int store) {
	if (stores.isEmpty()) {
		stores.add(store);
		return;
	}
	int i = 0;
	for (Integer storeNumber: stores) {
		if (storeNumber.compareTo(store) >= 0) {
			stores.add(i, store);
			return;
		}
		i++;
	}
	stores.add(store);
}

This could throw a ConcurrentModificationException.

WTF? It turns out that some (most) implementations of collections are very particular about allowing you to modify them while you are iterating. Creating an iterator, then modifying the collection, then trying to iterate is generally not allowed. See, for example, the ArrayList docs: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ArrayList.html and note that “…if the list is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove or add methods, the iterator will throw a ConcurrentModificationException.”

The exception will only be thrown if the iterator (the top of the for loop) is reached again after the list is modified. That doesn’t happen here, but it’s something to watch out for in your own code in upcoming projects.

So instead we could work with indices directly:

public void addStore(int store) {
	if (stores.isEmpty()) {
		stores.add(store);
		return;
	}
	for (int i = 0; i < stores.size(); i++) {
		if (stores.get(i).compareTo(store) >= 0) {
			stores.add(i, store);
			return;
		}
	}
	stores.add(store);
}

Automatic sorting

Here’s an alternative?

stores.sort(null);

What’s up with the sort(null) call? Let’s look at the API: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/List.html#sort(java.util.Comparator)-

The Comparator type parameter is an interface that describes how to compare two arbitrary objects (see also its simpler predecessor, the Comparable interface). You might implement it if you wanted to do something unusual with the sort, for example, put all odd numbers before all even numbers (which maybe sounds nonsensical, but think about mail delivery up and down each side of the street).

But you don’t actually have to implement its abstract methods to use it here. Reading the documentation:

If the specified comparator is null then all elements in this list must implement the Comparable interface and the elements’ natural ordering should be used.

Do Integers implement Comparable? Let’s check: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Integer.html

Yup. In general, you should start doing your best to read and follow links in the Java API if you see things in code you don’t understand. You might not understand everything you read, but only by trying are you going to learn, and at some point you can’t expect your instructors to spoon-feed you everything (though I will definitely cover the highlights).

So anyway, we can exercise option 2:

Option 2: Append the number then sort the list!

public void addStore(int store) {
	stores.add(store);
	stores.sort(null);
}

This may not be as efficient as inserting into the correct spot – we’ll learn more about this later in the semester – but it’s much easier conceptually and to write.

A worked example for `Comparator`

So we’ve been talking about Lists and generics and reviewing 121 stuff like boolean conditions and flow control. Let’s do another worked example.

We’re going to design and write a class to represent simplified postal addresses (street numbers and names). Then we’re going to dump several instances of it into a list. Finally, we’re going to define a custom Comparator on a postal address that will let us sort the list in a special way.

Let’s get going.

Given the problem statement, we know we’re going to want to define a class that defines objects containing a street number and name:

public class PostalAddress {
	public final int number;
	public final String name;
}

What is up with Marc and his use of public final? Here’s what’s up: when you know a data type is going to remain fixed, and that any particular object isn’t going to have its data change, there’s no reason to deal with the pain of writing private instance variables and then turning around to write public accessors (get methods). “But Marc, what if you change how the value is stored, or how it is determined?” Then you’ll be giving a different name (or writing an accessor, etc.) and your IDE will flag all occurrences for you to fix. Or better yet, will allow you to refactor them all yourself.

(See “Effective Java, 2nd edition” which though over a decade old is still one of the best bits of reading you can do once you’re an intermediate Java programmer.)

First steps

Anyway, now let’s add a few constructors. First the obvious one:

public PostalAddress(int number, String name) {
	this.number = number;
	this.name = name;
}

Now one that does some simple parsing:

public PostalAddress(String textAddress) {
	String[] matches = textAddress.split("\\s+", 2);
	this.number = Integer.parseInt(matches[0]);
	this.name = matches[1];
}

OK! Now we’ve got a simple class. Let’s try making a few of them and adding them to a list:

List<PostalAddress> addresses = new ArrayList<PostalAddress>();

for (int i = 1; i <= 10; i++) {
	addresses.add(new PostalAddress(i, "Maple St"));
}

and maybe printing them out:

System.out.println(addresses);

Ugh, what’s this PostalAddress@677327b6 nonsense? Remember, when you print an object, Java tries to coerce it to a String using its toString method. We haven’t written one, so we get the Object default, which is what you see. It is derived from the class name and the hashCode() method. Since we haven’t defined the latter, we get its default, which is usually but not always the object’s memory address. Yuck. Let’s make it better:

public String toString() {
	return name + ", " + number;
}

Now it’s a little better. Hey, let’s see if 6 Maple Street is in our list:

System.out.println(addresses.contains(new PostalAddress("6 Maple St")));

What will this print?

false? What? Maybe our constructor is broken, let’s try the other one:

System.out.println(addresses.contains(new PostalAddress(6, "Maple St")));

Nope, still false. Why? Let’s look at the List.contains method javadoc. It uses equals(). Ahh, but we haven’t written an equals, which means we’re using the default one from Object. …which checks hashCode, which will be different for different instances of the object. (Look it up in the java doc.) Let’s fix this problem?

public boolean equals(PostalAddress o) {
	return number == o.number && name.equals(o.name);
}

But it still doesn’t work? Note that contains (and equals) work on Objects, not PostalAddresses. So we need to change the signature. And then check the type:

public boolean equals(Object o) {
	if (!(o instanceof PostalAddress)) return false;
	PostalAddress p = (PostalAddress)o;
	return number == p.number && name.equals(p.name);
}

OK, better. But I will note we are omitting some important-in-practice details, as we’re ignoring null and we’re violating the contract for hashCode:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

We’ll come back to hashCode in more detail when we get to Sets later, but for now we’ll do what Java programmers often do: use your IDE’s code generator to do it, or hack up a “good enough” one. Java11 makes this easier than in the past with a bunch of utility methods in java.util.Objects:

public int hashCode() {
	return Objects.hash(number, name);
}

Given a list of instance variables (that already have hash codes defined), Objects.hash constructs a new, valid hash code on the basis of these hashes. More on this later.

Comparators

OK, on to our actual task. Let’s say we want to be able to sort our address list. We use the sort function on the List, which if passed null uses the natural order, or if passed a Comparator uses it to sort the list. But what’s the natural order of a PostalAddress? It’s only defined if the PostalAddress implements Comparable. So looks like we’re doing one or the other.

Generally, you want to define Comparable if you expect values of the data type to be compared and you want there to be a canonical way to compare them – a “natural ordering.” You usually define custom Comparators for things like the custom sort. Let’s do both. First, let’s impose a natural order on PostalAddress so that they sort first by street name, then by number. Add implements Comparable<PostalAddress> to the class signature, and Code can helpfully add the missing method:

@Override
public int compareTo(PostalAddress o) {
	// TODO Auto-generated method stub
	return 0;
}

Well that won’t do.

What to do

Implement the method. (JavaDoc for compareTo on projector.) See: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Comparable.html

The memory aid is that if you want x < y, then you need to write compareTo such that x.compareTo(y) < 0.

Similarly, for x > y, then you need to write compareTo such that x.compareTo(y) > 0.

@Override
public int compareTo(PostalAddress o) {
	// other's street name is before
	if (name.compareTo(o.name) < 0) return -1;

	// other's street name is after
	if (name.compareTo(o.name) > 0) return 1;

	// same street name!

	// check by number
	// if (number < o.number) return -1;
	// if (number > o.number) return 1;
	return Integer.compare(number, o.number);
}

Remember, you can look up Integer.compare in the Java API (or just Google it).

Let’s create the list out of order, print it, sort it, then print it:

for (int i = 10; i >=1; i -= 2) {
	addresses.add(new PostalAddress(i, "Maple St"));
}		
for (int i = 1; i < 10; i += 2) {
	addresses.add(new PostalAddress(i, "Birch St"));
}
System.out.println(addresses);
addresses.sort(null);
System.out.println(addresses);

Hey, it works!

Now let’s define a custom comparator for use in doing a “postal sort”. That is, we still want to sort such that street names are alphabetical, but we want the numbers sorted as all odd first (in ascending order), then all even (in descending order). This is how the truck might go up and down the street (on board).

What does that look like? Let’s declare a new Comparator:

public class PostalOrderComparator implements Comparator<PostalAddress> { ... }

Again, Code can helpfully fill it out with the method we need to implement, so let’s do it.

It will be similar to but more complicated than the compareTo method we just wrote. A tip: x % 2 == 0 if and only if x is even. x % 2 == 1 iff it’s false.

public int compare(PostalAddress o1, PostalAddress o2) {
	// if names are not equal, just sort by name!
	if (!o1.name.equals(o2.name)) {
		return o1.name.compareTo(o2.name);
	}

	// else, do a postal ordering by number
	// first, odds come before evens
	if (o1.number % 2 == 1 && o2.number % 2 == 0) return -1;
	if (o1.number % 2 == 0 && o2.number % 2 == 1) return 1;

	// both odd
	if (o1.number % 2 == 1 && o2.number % 2 == 1) return Integer.compare(o1.number, o2.number);
	// both even
	return -Integer.compare(o1.number, o2.number);

}

And let’s check it out:

for (int i = 6; i >=1; i -= 2) {
	addresses.add(new PostalAddress(i, "Maple St"));
}		
for (int i = 1; i < 6; i += 2) {
	addresses.add(new PostalAddress(i, "Birch St"));
}
for (int i = 6; i >=1; i -= 2) {
	addresses.add(new PostalAddress(i, "Birch St"));
}		
for (int i = 1; i < 6; i += 2) {
	addresses.add(new PostalAddress(i, "Maple St"));
}
System.out.println(addresses);
addresses.sort(null);
System.out.println(addresses);		
addresses.sort(new PostalOrderComparator());
System.out.println(addresses);

Things we might do to improve this? Add an isOdd and/or isEven method for readability, perhaps? Pull out o1.number and o2.number into local variables? Both are debatable. Here’s what we ended up with in class:

Sets: an introduction

Next, we’re going to move on in our list of top-n abstract data types from the List to the Set. In order to give you some grounding in what a set is and how it differs from a list, we’re going to turn to an arcane and little-known subject: Mathematics. To be clear, we’re going to do a very gentle introduction to “simple” set theory; if you have already taken a discrete math course this will be review for you, and if you stay in CS, you’ll see this again in a lot more detail in COMPSCI 250.

Simply put: a set is a collection of distinct objects.

The objects can be anything: people, numbers, shapes, colors, (or perhaps most topically, instances of Java objects). These objects are generally referred to as members or elements of a set.

In set theory, sets are named by a single uppercase letter: A or B, for example.

For our purposes, we’ll usually write sets as a list of the elements. The list will be comma separated, and will be enclosed in curly braces. For example: A = {1, 3, -6} describes a set called “A” that has three integers as elements.

Sets contain a collection of unique items. That is, sets cannot have duplicate items.

While the items in a set might have an implicit, natural order (like the integers), the set itself doesn’t define an order. So {1, 3, -6} = {-6, 1, 3}, that is, they’re the same set. Order doesn’t matter when comparing sets. (This is very different from our intuition with lists, where different orderings do matter.)

There are a few bits of notation I want you to have seen, so now I’m going to write them down for you.

First, how do we say a set contains an element? There’s a symbol that looks like this: ∈ (kind of a funny “E”) which is used to denote set membership. For example, “3 ∈ A” is pronounced “3 is an element of A.” You can think of the funny E as standing for “Element of” to help remember it. Likewise, ∉ means “not an element of” or “does not contain”, as in “10 ∉ A”. How do we say a set is empty? We call it the empty set and write it as ∅ (or {}).

Next, sometimes we might want to talk about one set being “contained within another”. For example, if B = {1, 3, -6, 10}, we might say A is a “subset” of B. This is written with the set containment symbol, which looks kinda like a curvy less-than-or-equal-to (or greater-than-or-equal-to) “A ⊆ B” and it “opens toward” the bigger set.

Sets are sometimes represented abstractly as “Venn diagrams”. Here’s what the above two sets might look like as a Venn diagram:

(on whiteboard)

Finally, there are a few operations on sets you should know about.

First is “union”, written as a little ∪. The union of two sets contains all their elements. So if we have A = {1, 3, -6} and C = {9, 3, 4}, then A ∪ C = {1, 3, -6, 9, 4}. As a mnemonic, think of the “United States” as union of many things into one.

(on whiteboard)

Next is the “intersection”. The intersection of two sets contains only the elements they have in common and is written as an upside-down u: ∩. Continuing our example from above, A ∩ C = {3}.

(on whiteboard)

Think of the “intersection of two roads”: the intersection is just the part they share, not all of both roads.

Two more operations (I promise) then we’re done with math and notation. First is set difference, sometimes called “relative complement” or “set-theoretic difference.” It’s written with a backslash \ and refers to all the elements in one set that aren’t in another. For example, A \ C = {1, -6}. Note that in set difference, which set you write first matters (unlike union and intersection). Finally, there’s symmetric difference of two sets, which is all the things in the union that aren’t in the intersection. It can be written as 𝝙. A 𝝙 C = {1, -6, 9, 4}.

Sets in Java

How does Java represent a set? As an abstract data type, specified by the Set interface. First we’ll talk about the properties and assumptions we might expect from a Set, in the abstract. Then we’ll talk about two concrete implementations of the data type provided by the Java API and see how they work.

sets, like lists, are unbounded, that is they don’t have a fixed size
duplicate elements are not allowed (only new elements are added; attempts to re-add existing elements are ignored – no error, just ignored)
sets are unordered (usually – though there is a subtype called an ordered set)
sets, lists can contain a null element (I hope you like NullPointerExceptions! though note some implementations might forbid null elements)
sets support an add (and an addAll) operation, which can modify the current set
sets support a remove operation of a specific value
sets support a size operation to determine how many elements are currently in the list
sets support a contains (and a containsAll) operation to check membership
and more, but we’ll get to them later when we look at the full API that Java supplies.

Let’s take a look at the interface: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Set.html

Not too different from List, though you’ll note some things (like remove at an index, or get) are not present, as those operations don’t make sense in the context of sets – they’re unordered, so there is no index!

Pay special attention to a few things:

”…sets contain no pair of elements e1 and e2 such that e1.equals(e2)” – the equals method is very important to sets, and if you stick objects in that don’t have an equals method, they’ll use Object’s equals method. Make sure that’s what you want if so!

Also note that “great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.” In other words, if you have a setter that changes an instance variable in an object, and that instance variable is considered by the object’s equals method, Set will have undefined (read: bad) behavior.

So putting relatively immutable things into sets is OK. Like Integers or Strings. Putting arbitrary objects that can be changed is not so good. Putting things that can be changed, but that you won’t change is OK but dangerous – what if you accidentally do end up changing the object? The Set will almost certainly misbehave in a weird way.

Other than those two restrictions, you can use Sets almost like Lists. Let’s do some examples:

Set<Integer> s = new HashSet<Integer>();

s.add(1);
s.add(2);
System.out.println(s); // like lists, you can print them and their contents is printed

Set<Integer> t = new HashSet<Integer>();

t.add(2);
t.add(3);
t.add(4);

for (Integer i : t) {
	System.out.println(i); // like lists, you can iterate over them
}

s.addAll(t); // all elements in t are added to s; t is unchanged but s is not!
System.out.println(s);
System.out.println(t);

s.removeAll(t); // all elements in t are removed from s, as above
System.out.println(s);
System.out.println(t);

And you generally do want to use Sets when the set properties (of uniqueness and lack-of-intrinsic-order) apply to your data set, especially if your data set is going to be large.

Why? (you might ask.) Because sets have much, much better general performance for insertion, removal, and containment-testing than lists. How? (you might ask.) Well, now we have to talk a little about how the two most common implementations of Sets work: HashSets and TreeSets.

On `HashSet`s

One possible implementation of the set is the HashSet, which depends upon a correct hashCode method. Why? “This class implements the Set interface, backed by a hash table (actually a HashMap instance).” Now let’s look at the documentation for hashCode: “This method is supported for the benefit of hash tables such as those provided by HashMap.”

Wow, hash tables are so important that every object in Java must supply a hashCode method – it’s built into Object.

hashCode returns an integer, and must obey the contract in its documentation. Let’s look at each piece:

It provides something like (but not exactly like!) equality: If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

This implies that if you use a field in an equals method, you should also use it in the hashCode method.

It is consistent: Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
It is not an equality check, though: It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

So you could have a hashCode method that always returned the same integer, like 1, and it would technically obey the contract. But usually, the hashCode of objects is not 1, but instead a large integer. Going back to our old example of (not) aliasing:

      String s = new String("x");
      String t = new String("x");

      System.out.println(s == t);
      System.out.println(s.equals(t));
      System.out.println(s.hashCode());
      System.out.println(t.hashCode());

Do you expect them to be ==? No. Do you expect them to be equals? Yes. Do you expect them to have the same hash code? Yes, because of the first property above.

Why does this weird integer result in fast (“constant time”) lookups?

Because you can use it as an index into an array.

In short, “hash tables” are arrays that store objects based upon their “hash code”. If you want to put an element into the array, you figure out the right place to put it by checking its hash code. And if you want to see if an element is in the array, you look up its hash code, then jump to the right spot in the array.

In a perfect world, the array would be big enough to hold everything, and the hash codes would always be unique per-object, and this would all just work. In practice, sometimes there are collisions – more than one object ends up in the same spot in the array. We resolve these collisions in different ways (one way: each element of the array might be a short linked list of elements with the hash code corresponding to that element’s index), and things usually work out with near-constant-time performance.

Week 05: Generics and comparators; introduction to sets

Aliasing and references in Lists (and other container types)

Immutability

Generics

Motivation

Example

Type parameters

More on parameters

Why generics matter

Lists and sorting

Automatic sorting

A worked example for Comparator

First steps

Comparators

What to do

Sets: an introduction

Sets in Java

On HashSets

A worked example for `Comparator`

On `HashSet`s