The test cases referenced in the autograder’s output for Assignment 09 are available for download here: assignment09-test-cases.tar.gz
A sample solution in Java is available here: KMeans.tar.gz. It implements the algorithm for k-means clustering with the constraints given in the assignment. When computing Euclidean distance between points, it computes the sum of the squares of the dimension-wise differences, but doesn’t take the square root of that sum. This is fine, since the square root is a monotonically increasing function of its input.
Some notes on the grading from Patrick:
Some people misunderstood the algorithm.
- For example, one person updated each cluster center by assigning it the value of the data point closest to the mean of the cluster rather than the mean itself.
- Another person did something with random numbers that I didn’t understand.
- Another did not iterate, outputting the first cluster assignment.
I also noticed a pair of mistakes crop up more than once.
- One mistake was not handling greater than 1 dimension correctly (by ignoring the additional values, or doing something else that was unclear to me). In this case, the two 1d tests passed, but the other tests had errors.
- One more subtle mistake was using Manhattan distance (the sum of the absolute value of the dimension-wise differences) rather than the Euclidean distance as specified. This error resulted in only a small number of mis-labelings in the 2d case, but many in the 10d case.
As usual, if you think something in the grading was incorrect (and not in your favor), please email or come to see Patrick or me.