Lecture 12: Maps, spam, pams, anagrams
Announcements
Programming assignment 06 is due Friday.
The last day to withdraw is Thursday. You must return a signed “Course Change Request” to the registrar by Thursday at 5pm. The Computer Science main office (CS room 100) staff are allowed to sign on my behalf, if you cannot easily find me.
A visitor from TEFD will conduct a MAP at the start of lecture today.
Today’s agenda
Today we’ll going to do another worked example. In particular, we’re going to write an anagram finder, that is, a program that given a word lists, looks for words in that list that are anagrams of one another. An anagram of a word is a rearrangement of the letters in the word that results in a new (different) word, using each of the letters from the original word once. I’ll go start-to-finish again today, building the project in Eclipse.
Program sketch
What should our program do?
First, how might we recognize anagrams? For example, we could write a method to count the occurrences of each letter in a word, then store that in an array (or object), then check those arrays/objects for equals
-style equality. And then we could perhaps write a hashCode
method for those objects, etc.
You could do that, and it would work. But for purposes of illustration, I’m going to take advantage of a well-known trick, which is that if you sort the letters of two words and compare the sorted letters, they’ll be equal iff the words are anagrams of one another.
So now we’ve got the core of the algorithm. How can we group words into clusters of anagrams? I’m going to suggest a Map<String, List<String>>
. The keys are going to be the sorted-letter version of the words, and the values will be lists of all the words that are associated with this sorted-letter version of the word. This structure, where a key is associated with many values, is sometimes called a “Multimap”; Java doesn’t directly support multimaps, but associating a value with a collection type is an ad hoc version of this.
So what are we going to do? Something like the following:
- read a list of words (a list? an array? or process one-by-one? up to us; the first is probably simplest, the last might be more memory efficient; it depends a lot upon how big you expect the list to be)
- create a multimap
- for each word:
- compute its sorted version
- insert it into the multimap
Then, we can write methods to query the multimap. Let’s get started.
Coding up AnagramFinder
First, the instance variable:
public class AnagramFinder {
private final Map<String, List<String>> anagrams;
public AnagramFinder() {
anagrams = new HashMap<String, List<String>>();
}
}
Next, the add
method, to add a word to the AnagramFinder
. What should it do? It should look up the word in the map, and add it to the associated list. What if there is no list? It should make a new one and insert it into the map.
To look up the word, we’ll need a method to return the letter-by-letter alphabetization of a String. There are several ways to do this. Here’s one:
private static String alphabetized(String word) {
char[] a = word.toCharArray();
Arrays.sort(a);
return new String(a);
}
Why is this method static? It does not depend upon the instance in any way, so there is no need to make it an instance method. If it does later change to be part of the instance (that is, if we attempt to call an instance method from it, or use an instance variable from it), the type checker will alert us. Further, static methods could be moved (or copied) easily to another class if appropriate – this is part of a process called “refactoring”.
OK, back to add
. There are a couple of different ways you could write this. For example, you could handle the two cases completely separately:
public void add(String word) {
String key = alphabetized(word);
if (!anagrams.containsKey(key)) {
List<String> l = new ArrayList<String>();
l.add(word);
anagrams.put(key, l);
}
else {
List<String> l = anagrams.get(key);
l.add(word);
}
}
Or you could deal with the not-in-map problem first, and unify things otherwise:
public void add(String word) {
String key = alphabetized(word);
if (!anagrams.containsKey(key)) {
anagrams.put(key, new ArrayList<String>());
}
List<String> l = anagrams.get(key);
l.add(word);
}
I find them both fairly readable, but things being otherwise equal, I will always choose the shorter solution.
Let’s write some code in our main
method to test this out.
public static void main(String[] args) {
AnagramFinder af = new AnagramFinder();
af.add("bird");
af.add("drib");
af.add("and");
af.add("nad");
af.add("dan");
af.add("it");}
OK, but we forgot to write methods to get anything out of the AnagramFinder
! Let’s do so now.
In class exercise 1
Returns the anagrams of a given word, or an empty list if there are no such anagrams.
public List<String> anagramsOf(String word) {
return anagrams.getOrDefault(alphabetized(word), new ArrayList<String>());
}
Does it work?
System.out.println(af.anagramsOf("and"));
System.out.println(af.anagramsOf("it"));
System.out.println(af.anagramsOf("boo"));
Let’s add the ability to read from a file:
public void addFromFile(Path path) throws IOException {
BufferedReader br = Files.newBufferedReader(path);
for (String word = br.readLine(); word != null; word = br.readLine()) {
add(word);
}
}
(Note there are lots of ways to read from files, this is just one.)
Now let’s write a method to find the word(s) with the most anagrams There could be more than one, but let’s just return any such list with the most anagrams (or an empty list if there are none yet). You’ll probably want to use the Map.values
.
In-class exercise 2
public List<String> mostAnagrams() {
int longest = -1;
List<String> list = new ArrayList<String>();
for (List<String> grams : anagrams.values()) {
if (grams.size() > longest) {
longest = grams.size();
list = grams;
}
}
return list;
}
Now let’s test it:
System.out.println(af.mostAnagrams());
af.addFromFile(new File("/usr/share/dict/words").toPath());
System.out.println(af.mostAnagrams());