The test cases referenced in the autograder’s output for Assignment 08 are available for download here: assignment08-test-cases.tar.gz
A sample solution in Java is available here: DecisionTree.tar.gz. It mostly follows the algorithm in Russell and Norvig. It omits the first termination condition (“if examples is empty”) since that case will not arise in a tree composed of only boolean attributes (though it didn’t hurt to leave it in). You can run the program with only training data specified on the command line to see a crude printout of the tree it learns.
Some notes on the grading:
- Unlike the naive Bayes classifier, there is ambiguity in the tree your program could correctly learn. If there is a tie in information gain, it’s unspecified which attribute the program should select to split on.
- The
small
test cases are all intended to be unambiguous in terms of output, so your program’s output was evaluated strictly. That is, they had to match the expected output exactly. If you believe I am mistaken here, please send me an email outlining your reasoning. If I’m wrong I’ll be more than happy to give you (and everyone else) their points back. - The
large
test cases are somewhat ambiguous, depending upon the exact tree your program constructed, so I evaluated liberally. This means that I checked that the labels were right (I did not examine the probabilities), and that I allowed for a small fraction of misclassification relative to my sample implementation (up to 15% different and you still received full credit).
- The
- A common mistake was to stop splitting early when all attributes had an information gain of zero. Some functions require splitting on several attributes. The XOR
small
test caught this error.
As usual, if you think something in the grading was incorrect (and not in your favor), please email or come to see Patrick or me.