CMPSCI 187: Programming With Data Structures

David Mix Barrington

Fall, 2011

Q&A for Programming Project #7:Some Games With Words

Question text in black, answers in blue.

Please note that there were several updates in the specification and the posted stub code -- these were often in response to student questions but I haven't explicitly answered those questions here.

Question P7.1, posted 7 December: I understand how a digit sequence codes a number of phonewords, but how do I look at all those possible phonewords?
Consider the digit seqeunce "22222". There are 3⁵ possible phonewords that could be made from this, ranging from "aaaaa" to "ccccc" since 2 expands to 'a', 'b', or 'c'. You could make five nested for loops to cycle through all these possibilities, and then test whether each one is in the prefix tree using the contains method. This would get the right answer, but would waste a lot of time.
It's important to prune this search when you encounter prefix strings that do not have nodes in the prefix tree. For example, "", "a", and "aa" have nodes, but "aaa" does not. So you move on to "aab" and "aac" (which have no nodes) without looking into any strings that extend those. Then you find "ab" and "aba" which do, and search "aba" until you find the word "abaca" (a type of banana from the Philippines), whereupon you can calculate its Scrabble score and put it in the priority queue.
There are at least three good ways to manage this pruned version of the general search for all extensions:
1. The five for loops, with a break statement when you find that you want to move on,
2. A recursive method, where you search for strings that start with some letters and finish with letters coded by some digits. You could search ("", "22222") by doing the three searches ("a", "2222"), ("b", "2222"), and ("c", "2222"). If the second argument is empty, you are in the base case and you should report your word. If your first argument is a string that has no node in the prefix tree, you move on.
3. You could do a backtrack search similar to that in Project #2, with an explicit stack or just keeping track of the string you are building up. When the current string has no node in the prefix tree, you backtrack.
Question P7.2, posted 7 December: How should I handle the case of 0's or 1's in the digit string?
These inputs are valid, because I said that an input could be any five characters that were digits or stars. But of course any digit sequence with a 0 or a 1 in it has no phonewords. You should thus return an array of length 0. You probably don't even need special code for this -- your search will stop without finding any words, and you will then copy the (empty) priority queue into an array of length 0 and return it.
If you get invalid characters (not digits or stars) in the parameter of list, you may just print an error message or throw any kind of exception -- you are not responsible for what happens then.
Question P7.3, posted 7 December In response to another question, you said that the size field of PrefixTree should contain the number of leaves, which you said was the number of words stored in the tree. But what if I put in the word "pesto" -- we then create a node for "pest", and isn't that a word?
In this case, it isn't a "word" because we are defining "words" to be just the ones on Knuth's list, all of which have exactly five letters. No one of those words can be a prefix of another -- if that were possible you would have a good point.
The easiest way to maintain the size of a PrefixTree is to set it to 0 in the constructor and then add 1 every time you run addString -- if you want to be careful, you can wait to add 1 until you know that the added word is new.
Question P7.4, posted 7 DecemberIn a prefix tree, do the labels on the nodes represent the new letters added to the string, or the string itself? If I add "koala" to an empty tree, are the five new nodes labeled "k", "o", "a", "l", and "a", or "k", "ko", "koa", "koal", and "koala"?
You could actually do it either way, because the driver is never going to look at those labels, but I think it makes more sense to do it your second way, with the label storing the string whose presence in the tree is due to that node.
Question P7.5, posted 7 December What do I do if two words have the same Scrabble score?
Put them both in the your final output array, in either order. You have no instructions as to which to put first, so either order is correct (any order among words with equal scores is correct.) We will make sure that the driver's tests don't ask about words in this sort of situation.
Question P7.6, posted 7 December How exactly do I get the word list from the internet into my program?
The first step is to copy the text file from the internet to the project7 directory in your edlab account. (We need it to be there so that we can test your code on the file that you used with the name that you gave it.) Then your constructor has to do some file I/O. There are various magic words in Java that will make this happen, such as creating a Scanner or a BufferedReader. You should be able to adapt any code that correctly reads from a text file, for example from CMPSCI 121 or whatever equivalent you took. Read the strings from the file (each is on a separate line) and add each one to the PrefixTree once you have read it. Be sure to use a relative address for the file, like "KnuthWords.txt", rather than an address that has directory names in it. If we copy all your files from Project7 to a different directory, your code should still work because it only needs the text file and the code to be in the same directory as each other.
Can I take code to read the file off of the web someplace?
As long as you attribute it, sure.
Question P7.7, posted 7 December You said that we need to build a HeapPQ class for full credit, using L&C's code as we can. So we take their PriorityQueue<T> and ArrayHeap<T> classes in their entirety?
You could do that, or you could build the heap as a non-generic array of the type you need and then mimic their code for the heap operations.
I took their classes as written, and they don't seem to work for large priority queues like the one with 5757 nodes from list("*****").
There seems to be a problem in their heapifyRemove method on pages 352-3, where they look at "children" that are past the end of the array. When they look for, e.g., tree[left] == null, they have to consider that left might be beyond the end of the tree.
I don't have time to debug their code! I'll just use a java.util.PriorityQueue and take the grade penalty. Does that mean I need to write a compareTo method for the String objects I put into the priority queue?
Well, you can't rewrite the compareTo method of String because it belongs to Oracle, not you. You need to make a new class, called ScrabbleWord or something, where an object in the new class has a word and a Scrabble score, and you compare two of those objects by Scrabble score.
Question P7.8, posted 7 December: When I create a new PTNode, does it automatically have the array of 26 pointers to other PTNode objects?
That array gets created by the initializing statement in the field declaration, but its entries are all null until you change them -- the array is not populated. But you want those entries to be null until you create some of the child nodes in the addString method.
Question 7.9, posted 7 December: I've got code that works but it took an hour and a half to run list("*****"). Am I doing something wrong?
Probably. Someone else reports that method on that input running in a minute on a two-year-old PC, and someone else says their code runs in a quarter of a second. My best guess is that you are not pruning your search and thus testing all 26⁵ five-letter strings rather than just looking at prefixes that could possibly lead to words in the list.
Question P7.10, posted 7 December: The driver uses the getTree method of PhonewordLister, but that isn't in your stub code. Do we have to write it?
It may not have been in my stub code when you first copied it, but it is there now. (Not that public PrefixTree getTree( ) {return tree;} would be all that difficult to write...)
Question P7.11, posted 9 December: I seem to have a problem where words that I added to the prefix tree are no longer there when the driver looks for them. I put in a few words and it worked fine, but with the whole word list lots of words seem to be missing.
I looked at your code for addString and found an interesting error. When you add a five-letter string, you are always adding five new nodes even if the nodes for those strings already existed. So if you add "koala" you get the five nodes you should, but if you then add "kayak" you make a new node for "k", which has no "o" child, so that the node for "koala" is cut off and lost. You need an if statement so that you only call "setChild" if a new node is really needed.
Question 7.12, posted 9 December: I went through the list of all the words and found which numbers have the most phonewords. Here are the top seven of them. They should be good examples to test ordering against. I figured the class might find them useful:
```
   22737 (13)
   72837 (12)
   46637 (10)
   76737 (9)
   24337 (9)
   78737 (8)
   72937 (8)
```
Thanks, you get a point!

Last modified 7 December 2011