Aliasing and references in Lists (and other container types)
You probably noticed that this week’s assignment forced you to make a copy of a list that’s passed into a constructor. Some of you did it manually:
public class Assembler {
private final List<Fragment> fragments;
public Assembler(List<Fragment> fragments) {
this.fragments = new ArrayList<Fragment>();
for (Fragment f: fragments) {
this.fragments.add(f);
}
}
But you can also do it in one line, using the ArrayList
copy constructor, which copies one collection into another when that collection is passed as an argument:
this.fragments = new ArrayList<Fragment>(fragments);
But it does leave the question: Why do I make you do this? Because it’s a good idea! But why?
The real answer here is that if you don’t, then the list you’ve “passed into” the Assembler
is an alias of the list stored in the Assembler
’s instance variable. So changing one changes the other, even though it “looks like” they are different lists!
This can be a huge source of bugs for programmers who haven’t internalized that different references can refer to the same thing. When this bug bites, it’s not usually when you use two references in the same scope, like we did in the Bus
question on the problem set a week or so ago. It’s because you made a copy of a reference and used both as though they were separate, independent things. But they’re not! They’re the same object! And that leads to problems if you don’t realize they are the same object with different names in different scopes.
Immutability
Astute students may note that even if you copy the lists, the objects stored in the lists are still aliases of one another. This is true! And something you need to be careful of! But you’ll notice that some objects are basically “immutable” – you can’t change their state once they’re created.
For example, instances of String
can’t be changed once you make them. “But Marc, yes you can! You can call, for example, someString.toUpperCase()
and it becomes an upper-case string!
Not exactly, my fine friend. toUpperCase
returns a new string but leaves the original unchanged. There’s no way to modify the original String
object. You can change what the reference refers to, for example, someString = "A different string";
, but that’s not changing a String
object, it’s changing the reference stored in a String
variable.
You’ll notice that Fragment
in A04 was similar – once it’s constructed, no method in its API should change its structure. mergedWith
is like toUpperCase
– it returns a new Fragment
but doesn’t change the current one.
Generally, if you can make objects immutable in this way, you should. It prevents an enormous set of hard-to-track-down errors.
The other option is to perform a “deep copy,” copying not just the List
’s references to items, but all the items, as well. This is typically unnecessary, and sometimes a “code smell” indicating your program’s design is unnecessarily complex.
Generics
Motivation
Methods take parameters: rather than hardcoding all data and values into a method, we can make some parts of the data variable and parameterizable so that the method can be reused with different data and values. This make a lot of sense: Let’s say we want to write a method that adds five to its argument:
int add5(int i) {
return i + 5;
}
That’s useful. But someday we want to also write a method that adds six:
int add6(int i) {
return i + 6;
}
OK. Then add7
, etc. Getting silly. We don’t want to write a different method each time, since since (1) there’s an infinite number of them! and (2) the operation of adding is mechanically the same each time. That is, there’s a generalized algorithm we write, once, and then can use many times.
int add(int i, int j) {
return i + j;
}
Example
The insight behind “generics” is that we can do the same thing with types – we can parameterize classes (and methods) with a type, too, and use it on different types of things. Our crappy StringList
, for example, while it lets us hold any list of String
s we want, was still limited to String
data. But it turns out that instead of writing:
public interface StringList {
public void add(String s);
public void add(String s, int i);
public String remove(int i);
public String get(int i);
public int size();
}
you can parameterize the interface on a type using angle brackets:
public interface List<E> {
public void add(E e);
public void add(E e, int i);
public E remove(int i);
public E get(int i);
public int size();
}
The List
now sports a generic type name in angle brackets. We’ve defined a family of possible types here; note that each method that used to operate on strings now operates on this mysterious E
.
We can also type parameterize a class:
public class Node<E> {
private final E contents;
private Node<E> next;
// more
}
…and the two together let us write generic code, that operates on generic types, based upon the type parameter.
Here’s the fully converted Node
and LinkedList
:
public class Node<E> {
private E contents;
private Node<E> next;
public Node(E contents) {
this.contents = contents;
}
public E getContents() {
return contents;
}
public Node<E> getNext() {
return next;
}
public void setNext(Node<E> n) {
next = n;
}
}
public class LinkedList<E> implements List<E> {
private Node<E> head;
private int size;
public LinkedList() {
head = null;
size = 0;
}
@Override
public void add(E s) {
size++;
Node<E> n = new Node<E>(s);
// Case 1: empty list
if (head == null) {
head = n;
return;
}
// Case 2: non-empty list
Node<E> current = head;
while (current.getNext() != null) {
current = current.getNext();
}
// now reached the node at the end of the list
current.setNext(n);
}
@Override
public void add(int i, E s) throws IndexOutOfBoundsException {
if (i < 0 || i > size) {
throw new IndexOutOfBoundsException();
}
size++;
Node<E> n = new Node<E>(s);
// Case 1: insert at head of list, position 0
if (i == 0) {
// Step 1, set n's next pointer to head
n.setNext(head);
// Step 2: set head to n
head = n;
return;
}
// Case 2: inserting elsewhere in the list
// Step 1: find the node before
Node<E> nodeBefore = head;
for (int j = 1; j < i; j++) { // j = 0; j < i - 1
nodeBefore = nodeBefore.getNext();
}
// Step 2: update n's next pointer to the nodeBefore's next pointer
n.setNext(nodeBefore.getNext());
// Step 3: set nodeBefore's next pointer to n
nodeBefore.setNext(n);
}
@Override
public E remove(int i) throws IndexOutOfBoundsException {
if (i < 0 || i >= size) {
throw new IndexOutOfBoundsException();
}
size--;
final E result;
// Case 1: remove first node in the list
if (i == 0) {
result = head.getContents();
head = head.getNext();
return result;
}
// Case 2: remove other node from list
// Step 1: find the node before the node we want to remove
Node<E> nodeBefore = head;
for (int j = 1 ; j < i; j++) {
nodeBefore = nodeBefore.getNext();
}
final Node<E> nodeToDelete = nodeBefore.getNext();
result = nodeToDelete.getContents();
// Step 2: set nodeBefore's next pointer to the node after
// the node we are deleting
nodeBefore.setNext(nodeToDelete.getNext());
return result;
}
@Override
public E get(int i) throws IndexOutOfBoundsException {
if (i < 0 || i >= size) {
throw new IndexOutOfBoundsException();
}
int j = 0;
Node<E> current = head;
while (true) {
if (i == j) {
return current.getContents();
}
current = current.getNext();
j++;
}
}
@Override
public int size() {
return size;
}
}
Type parameters
The E
is a type parameter – it says that the programmer who declares a variable of type ListInterface
must also choose a particular type that the declared ListInterface
will handle. ListInterface
s of different type parameters are of different types. For example, you cannot assign one to another unless they have the same type parameter, any more than you can assign a boolean
to a String
:
boolean x = "banana"; // not allowed, fails at compile time
List<String> x;
List<Integer> y;
... // some code ...
x = y; // not allowed, fails at compile time
More on parameters
Type parameters are usually written as a single uppercase letter, and often that letter is an abbreviation. E
stands for E
lement of a collection; we’ll also see K
ey and V
alue later in the course.
Type parameters, when instantiated (that is, when a generic is declared), must be a non-primitive type. But, Java does something called auto-boxing, so you can generally mix primitives and non-primitives freely using the associated wrapper types, like Integer
. (Integer
and friends also have many useful static methods.)
Another fun fact for today about type parameters: Usually we think of them as being declared on classes (and indeed, that’s usually where they are declared). But if you write a particular method that would benefit from type parameterization, you can do so:
public class Pair<K, V> {
public final K k;
public final V v;
public Pair(K k, V v) {
this.k = k;
this.v = v;
}
}
public class Util<K, V> {
public static <K, V> boolean compare(Pair<K, V> p1, Pair<K, V> p2) {
return (p1.k.equals(p2.k) && p1.v.equals(p2.v));
}
}
Note the type parameters come immediately before the return type. This is important, as static
methods that operate on arguments (or have a return type) that’s generic come up all the time, and you need to be able to tell Java what the generic type(s) in the method are. They’re independent of the generic types, if any, that the class declares. For example, if the above class definition included public class Util<K>
, the K
in the compare
method is not necessarily the same as the K
in the method’s declaration.
Why generics matter
They matter for the reasons listed above (generic re-usable code)! But also, in Java 5, the entire Collections library was re-written to use generics. Before then, all container types (List
, etc.) only held things of type Object
, and you, the programmer, had to laboriously cast them each time you used them:
List list = new ArrayList();
list.add("hello");
String s = (String) list.get(0);
Not only was this a pain, if you made a mistake:
List l = new ArrayList();
l.add("Zero");
String s = (String)l.get(0);
//...
Integer i = (Integer)l.get(0); // throws exception at run-time!
System.out.print(i);
you’d find out at run-time, not at compile-time. And while I know you hate compiler errors right now, you’ll learn to love them when writing big programs — every error that the compiler catches is one you can fix at your leisure, while run-time errors are erratic, not always reproducible, and generally result in a (much bigger) headache for you.
Lists and sorting
Suppose you want to add items to a list (say, a stores
attribute of type List<Integer>
a store management system) and you want to keep it sorted. An aside: “sorted” is a funny word here. We really mean “in ascending (or descending) order”, but it’s a historical bit of vocabulary to say “sorted” that’s stuck around.
How would you write the public void addStore(int store)
method?
public void addStore(int store) {
if (stores.isEmpty()) {
stores.add(store);
return;
}
int i = 0;
for (Integer storeNumber: stores) {
if (storeNumber.compareTo(store) >= 0) {
stores.add(i, store);
return;
}
i++;
}
stores.add(store);
}
This could throw a ConcurrentModificationException
.
WTF? It turns out that some (most) implementations of collections are very particular about allowing you to modify them while you are iterating. Creating an iterator, then modifying the collection, then trying to iterate is generally not allowed. See, for example, the ArrayList
docs: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ArrayList.html and note that “…if the list is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove
or add
methods, the iterator will throw a ConcurrentModificationException
.”
The exception will only be thrown if the iterator (the top of the for
loop) is reached again after the list is modified. That doesn’t happen here, but it’s something to watch out for in your own code in upcoming projects.
So instead we could work with indices directly:
public void addStore(int store) {
if (stores.isEmpty()) {
stores.add(store);
return;
}
for (int i = 0; i < stores.size(); i++) {
if (stores.get(i).compareTo(store) >= 0) {
stores.add(i, store);
return;
}
}
stores.add(store);
}
Automatic sorting
Here’s an alternative?
stores.sort(null);
What’s up with the sort(null)
call? Let’s look at the API: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/List.html#sort(java.util.Comparator)-
The Comparator
type parameter is an interface that describes how to compare two arbitrary objects (see also its simpler predecessor, the Comparable
interface). You might implement it if you wanted to do something unusual with the sort, for example, put all odd numbers before all even numbers (which maybe sounds nonsensical, but think about mail delivery up and down each side of the street).
But you don’t actually have to implement its abstract methods to use it here. Reading the documentation:
If the specified comparator is null then all elements in this list must implement the
Comparable
interface and the elements’ natural ordering should be used.
Do Integer
s implement Comparable
? Let’s check: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Integer.html
Yup. In general, you should start doing your best to read and follow links in the Java API if you see things in code you don’t understand. You might not understand everything you read, but only by trying are you going to learn, and at some point you can’t expect your instructors to spoon-feed you everything (though I will definitely cover the highlights).
So anyway, we can exercise option 2:
Option 2: Append the number then sort the list!
public void addStore(int store) {
stores.add(store);
stores.sort(null);
}
This may not be as efficient as inserting into the correct spot – we’ll learn more about this later in the semester – but it’s much easier conceptually and to write.
A worked example for Comparator
So we’ve been talking about List
s and generics and reviewing 121 stuff like boolean conditions and flow control. Let’s do another worked example.
We’re going to design and write a class to represent simplified postal addresses (street numbers and names). Then we’re going to dump several instances of it into a list. Finally, we’re going to define a custom Comparator
on a postal address that will let us sort the list in a special way.
Let’s get going.
Given the problem statement, we know we’re going to want to define a class that defines objects containing a street number and name:
public class PostalAddress {
public final int number;
public final String name;
}
What is up with Marc and his use of public final
? Here’s what’s up: when you know a data type is going to remain fixed, and that any particular object isn’t going to have its data change, there’s no reason to deal with the pain of writing private instance variables and then turning around to write public accessors (get methods). “But Marc, what if you change how the value is stored, or how it is determined?” Then you’ll be giving a different name (or writing an accessor, etc.) and your IDE will flag all occurrences for you to fix. Or better yet, will allow you to refactor them all yourself.
(See “Effective Java, 2nd edition” which though over a decade old is still one of the best bits of reading you can do once you’re an intermediate Java programmer.)
First steps
Anyway, now let’s add a few constructors. First the obvious one:
public PostalAddress(int number, String name) {
this.number = number;
this.name = name;
}
Now one that does some simple parsing:
public PostalAddress(String textAddress) {
String[] matches = textAddress.split("\\s+", 2);
this.number = Integer.parseInt(matches[0]);
this.name = matches[1];
}
OK! Now we’ve got a simple class. Let’s try making a few of them and adding them to a list:
List<PostalAddress> addresses = new ArrayList<PostalAddress>();
for (int i = 1; i <= 10; i++) {
addresses.add(new PostalAddress(i, "Maple St"));
}
and maybe printing them out:
System.out.println(addresses);
Ugh, what’s this PostalAddress@677327b6
nonsense? Remember, when you print an object, Java tries to coerce it to a String
using its toString
method. We haven’t written one, so we get the Object
default, which is what you see. It is derived from the class name and the hashCode()
method. Since we haven’t defined the latter, we get its default, which is usually but not always the object’s memory address. Yuck. Let’s make it better:
public String toString() {
return name + ", " + number;
}
Now it’s a little better. Hey, let’s see if 6 Maple Street is in our list:
System.out.println(addresses.contains(new PostalAddress("6 Maple St")));
What will this print?
false
? What? Maybe our constructor is broken, let’s try the other one:
System.out.println(addresses.contains(new PostalAddress(6, "Maple St")));
Nope, still false. Why? Let’s look at the List.contains
method javadoc. It uses equals()
. Ahh, but we haven’t written an equals
, which means we’re using the default one from Object. …which checks hashCode
, which will be different for different instances of the object. (Look it up in the java doc.) Let’s fix this problem?
public boolean equals(PostalAddress o) {
return number == o.number && name.equals(o.name);
}
But it still doesn’t work? Note that contains
(and equals
) work on Object
s, not PostalAddress
es. So we need to change the signature. And then check the type:
public boolean equals(Object o) {
if (!(o instanceof PostalAddress)) return false;
PostalAddress p = (PostalAddress)o;
return number == p.number && name.equals(p.name);
}
OK, better. But I will note we are omitting some important-in-practice details, as we’re ignoring null
and we’re violating the contract for hashCode
:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
We’ll come back to hashCode
in more detail when we get to Set
s later, but for now we’ll do what Java programmers often do: use your IDE’s code generator to do it, or hack up a “good enough” one. Java11 makes this easier than in the past with a bunch of utility methods in java.util.Objects
:
public int hashCode() {
return Objects.hash(number, name);
}
Given a list of instance variables (that already have hash codes defined), Objects.hash
constructs a new, valid hash code on the basis of these hashes. More on this later.
Comparators
OK, on to our actual task. Let’s say we want to be able to sort
our address list. We use the sort
function on the List
, which if passed null
uses the natural order, or if passed a Comparator
uses it to sort the list. But what’s the natural order of a PostalAddress
? It’s only defined if the PostalAddress
implements Comparable
. So looks like we’re doing one or the other.
Generally, you want to define Comparable
if you expect values of the data type to be compared and you want there to be a canonical way to compare them – a “natural ordering.” You usually define custom Comparator
s for things like the custom sort. Let’s do both. First, let’s impose a natural order on PostalAddress
so that they sort first by street name, then by number. Add implements Comparable<PostalAddress>
to the class signature, and Code can helpfully add the missing method:
@Override
public int compareTo(PostalAddress o) {
// TODO Auto-generated method stub
return 0;
}
Well that won’t do.
What to do
Implement the method. (JavaDoc for compareTo
on projector.) See: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Comparable.html
The memory aid is that if you want x < y
, then you need to write compareTo
such that x.compareTo(y) < 0
.
Similarly, for x > y
, then you need to write compareTo
such that x.compareTo(y) > 0
.
@Override
public int compareTo(PostalAddress o) {
// other's street name is before
if (name.compareTo(o.name) < 0) return -1;
// other's street name is after
if (name.compareTo(o.name) > 0) return 1;
// same street name!
// check by number
// if (number < o.number) return -1;
// if (number > o.number) return 1;
return Integer.compare(number, o.number);
}
Remember, you can look up Integer.compare
in the Java API (or just Google it).
Let’s create the list out of order, print it, sort it, then print it:
for (int i = 10; i >=1; i -= 2) {
addresses.add(new PostalAddress(i, "Maple St"));
}
for (int i = 1; i < 10; i += 2) {
addresses.add(new PostalAddress(i, "Birch St"));
}
System.out.println(addresses);
addresses.sort(null);
System.out.println(addresses);
Hey, it works!
Now let’s define a custom comparator for use in doing a “postal sort”. That is, we still want to sort such that street names are alphabetical, but we want the numbers sorted as all odd first (in ascending order), then all even (in descending order). This is how the truck might go up and down the street (on board).
What does that look like? Let’s declare a new Comparator
:
public class PostalOrderComparator implements Comparator<PostalAddress> { ... }
Again, Code can helpfully fill it out with the method we need to implement, so let’s do it.
It will be similar to but more complicated than the compareTo
method we just wrote. A tip: x % 2 == 0
if and only if x
is even. x % 2 == 1
iff it’s false.
public int compare(PostalAddress o1, PostalAddress o2) {
// if names are not equal, just sort by name!
if (!o1.name.equals(o2.name)) {
return o1.name.compareTo(o2.name);
}
// else, do a postal ordering by number
// first, odds come before evens
if (o1.number % 2 == 1 && o2.number % 2 == 0) return -1;
if (o1.number % 2 == 0 && o2.number % 2 == 1) return 1;
// both odd
if (o1.number % 2 == 1 && o2.number % 2 == 1) return Integer.compare(o1.number, o2.number);
// both even
return -Integer.compare(o1.number, o2.number);
}
And let’s check it out:
for (int i = 6; i >=1; i -= 2) {
addresses.add(new PostalAddress(i, "Maple St"));
}
for (int i = 1; i < 6; i += 2) {
addresses.add(new PostalAddress(i, "Birch St"));
}
for (int i = 6; i >=1; i -= 2) {
addresses.add(new PostalAddress(i, "Birch St"));
}
for (int i = 1; i < 6; i += 2) {
addresses.add(new PostalAddress(i, "Maple St"));
}
System.out.println(addresses);
addresses.sort(null);
System.out.println(addresses);
addresses.sort(new PostalOrderComparator());
System.out.println(addresses);
Things we might do to improve this? Add an isOdd
and/or isEven
method for readability, perhaps? Pull out o1.number
and o2.number
into local variables? Both are debatable. Here’s what we ended up with in class:
Sets: an introduction
Next, we’re going to move on in our list of top-n abstract data types from the List
to the Set
. In order to give you some grounding in what a set is and how it differs from a list, we’re going to turn to an arcane and little-known subject: Mathematics. To be clear, we’re going to do a very gentle introduction to “simple” set theory; if you have already taken a discrete math course this will be review for you, and if you stay in CS, you’ll see this again in a lot more detail in COMPSCI 250.
Simply put: a set is a collection of distinct objects.
The objects can be anything: people, numbers, shapes, colors, (or perhaps most topically, instances of Java objects). These objects are generally referred to as members or elements of a set.
In set theory, sets are named by a single uppercase letter: A or B, for example.
For our purposes, we’ll usually write sets as a list of the elements. The list will be comma separated, and will be enclosed in curly braces. For example: A = {1, 3, -6} describes a set called “A” that has three integers as elements.
Sets contain a collection of unique items. That is, sets cannot have duplicate items.
While the items in a set might have an implicit, natural order (like the integers), the set itself doesn’t define an order. So {1, 3, -6} = {-6, 1, 3}, that is, they’re the same set. Order doesn’t matter when comparing sets. (This is very different from our intuition with lists, where different orderings do matter.)
There are a few bits of notation I want you to have seen, so now I’m going to write them down for you.
First, how do we say a set contains an element? There’s a symbol that looks like this: ∈ (kind of a funny “E”) which is used to denote set membership. For example, “3 ∈ A” is pronounced “3 is an element of A.” You can think of the funny E as standing for “Element of” to help remember it. Likewise, ∉ means “not an element of” or “does not contain”, as in “10 ∉ A”. How do we say a set is empty? We call it the empty set and write it as ∅ (or {}).
Next, sometimes we might want to talk about one set being “contained within another”. For example, if B = {1, 3, -6, 10}, we might say A is a “subset” of B. This is written with the set containment symbol, which looks kinda like a curvy less-than-or-equal-to (or greater-than-or-equal-to) “A ⊆ B” and it “opens toward” the bigger set.
Sets are sometimes represented abstractly as “Venn diagrams”. Here’s what the above two sets might look like as a Venn diagram:
(on whiteboard)
Finally, there are a few operations on sets you should know about.
First is “union”, written as a little ∪. The union of two sets contains all their elements. So if we have A = {1, 3, -6} and C = {9, 3, 4}, then A ∪ C = {1, 3, -6, 9, 4}. As a mnemonic, think of the “United States” as union of many things into one.
(on whiteboard)
Next is the “intersection”. The intersection of two sets contains only the elements they have in common and is written as an upside-down u: ∩. Continuing our example from above, A ∩ C = {3}.
(on whiteboard)
Think of the “intersection of two roads”: the intersection is just the part they share, not all of both roads.
Two more operations (I promise) then we’re done with math and notation. First is set difference, sometimes called “relative complement” or “set-theoretic difference.” It’s written with a backslash \
and refers to all the elements in one set that aren’t in another. For example, A \
C = {1, -6}. Note that in set difference, which set you write first matters (unlike union and intersection). Finally, there’s symmetric difference of two sets, which is all the things in the union that aren’t in the intersection. It can be written as 𝝙. A 𝝙 C = {1, -6, 9, 4}.
Sets in Java
How does Java represent a set? As an abstract data type, specified by the Set
interface. First we’ll talk about the properties and assumptions we might expect from a Set
, in the abstract. Then we’ll talk about two concrete implementations of the data type provided by the Java API and see how they work.
- sets, like lists, are unbounded, that is they don’t have a fixed size
- duplicate elements are not allowed (only new elements are added; attempts to re-add existing elements are ignored – no error, just ignored)
- sets are unordered (usually – though there is a subtype called an ordered set)
- sets, lists can contain a
null
element (I hope you likeNullPointerException
s! though note some implementations might forbidnull
elements) - sets support an
add
(and anaddAll
) operation, which can modify the current set - sets support a
remove
operation of a specific value - sets support a
size
operation to determine how many elements are currently in the list - sets support a
contains
(and acontainsAll
) operation to check membership - and more, but we’ll get to them later when we look at the full API that Java supplies.
Let’s take a look at the interface: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Set.html
Not too different from List
, though you’ll note some things (like remove
at an index, or get
) are not present, as those operations don’t make sense in the context of sets – they’re unordered, so there is no index!
Pay special attention to a few things:
”…sets contain no pair of elements e1
and e2
such that e1.equals(e2)
” – the equals
method is very important to sets, and if you stick objects in that don’t have an equals
method, they’ll use Object
’s equals
method. Make sure that’s what you want if so!
Also note that “great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.” In other words, if you have a setter that changes an instance variable in an object, and that instance variable is considered by the object’s equals
method, Set
will have undefined (read: bad) behavior.
So putting relatively immutable things into sets is OK. Like Integer
s or String
s. Putting arbitrary objects that can be changed is not so good. Putting things that can be changed, but that you won’t change is OK but dangerous – what if you accidentally do end up changing the object? The Set
will almost certainly misbehave in a weird way.
Other than those two restrictions, you can use Set
s almost like List
s. Let’s do some examples:
Set<Integer> s = new HashSet<Integer>();
s.add(1);
s.add(2);
System.out.println(s); // like lists, you can print them and their contents is printed
Set<Integer> t = new HashSet<Integer>();
t.add(2);
t.add(3);
t.add(4);
for (Integer i : t) {
System.out.println(i); // like lists, you can iterate over them
}
s.addAll(t); // all elements in t are added to s; t is unchanged but s is not!
System.out.println(s);
System.out.println(t);
s.removeAll(t); // all elements in t are removed from s, as above
System.out.println(s);
System.out.println(t);
And you generally do want to use Set
s when the set properties (of uniqueness and lack-of-intrinsic-order) apply to your data set, especially if your data set is going to be large.
Why? (you might ask.) Because sets have much, much better general performance for insertion, removal, and containment-testing than lists. How? (you might ask.) Well, now we have to talk a little about how the two most common implementations of Set
s work: HashSet
s and TreeSet
s.
On HashSet
s
One possible implementation of the set is the HashSet
, which depends upon a correct hashCode
method. Why? “This class implements the Set interface, backed by a hash table (actually a HashMap instance).” Now let’s look at the documentation for hashCode
: “This method is supported for the benefit of hash tables such as those provided by HashMap.”
Wow, hash tables are so important that every object in Java must supply a hashCode
method – it’s built into Object
.
hashCode
returns an integer, and must obey the contract in its documentation. Let’s look at each piece:
- It provides something like (but not exactly like!) equality: If two objects are equal according to the
equals(Object)
method, then calling thehashCode
method on each of the two objects must produce the same integer result.
This implies that if you use a field in an equals
method, you should also use it in the hashCode
method.
- It is consistent: Whenever it is invoked on the same object more than once during an execution of a Java application, the
hashCode
method must consistently return the same integer, provided no information used inequals
comparisons on the object is modified. - It is not an equality check, though: It is not required that if two objects are unequal according to the
equals(java.lang.Object)
method, then calling thehashCode
method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
So you could have a hashCode
method that always returned the same integer, like 1
, and it would technically obey the contract. But usually, the hashCode
of objects is not 1, but instead a large integer. Going back to our old example of (not) aliasing:
String s = new String("x");
String t = new String("x");
System.out.println(s == t);
System.out.println(s.equals(t));
System.out.println(s.hashCode());
System.out.println(t.hashCode());
Do you expect them to be ==
? No. Do you expect them to be equals
? Yes. Do you expect them to have the same hash code? Yes, because of the first property above.
Why does this weird integer result in fast (“constant time”) lookups?
Because you can use it as an index into an array.
In short, “hash tables” are arrays that store objects based upon their “hash code”. If you want to put an element into the array, you figure out the right place to put it by checking its hash code. And if you want to see if an element is in the array, you look up its hash code, then jump to the right spot in the array.
In a perfect world, the array would be big enough to hold everything, and the hash codes would always be unique per-object, and this would all just work. In practice, sometimes there are collisions – more than one object ends up in the same spot in the array. We resolve these collisions in different ways (one way: each element of the array might be a short linked list of elements with the hash code corresponding to that element’s index), and things usually work out with near-constant-time performance.