CMPSCI 383: Artificial Intelligence

Fall 2014 (archived)

Assignment 07 Graded, Sample Solution Posted

The Assignment 07 submissions have been graded. I have posted grades on Moodle.

The test cases referenced in the autograder’s output are available for download here: assignment07-test-cases.tar.gz

A Java sample solution is available here: NaiveBayesClassifier.tar.gz. The useAlternativeMethod flag indicates which way the classifier will handle ? values (see below).

Some notes on the tests and grading:

  • Correctness was defined as outputting both the correct class and the correct probability of that class. With a few minor exceptions detailed here and that we accounted for when grading, there was no ambiguity in the correct output. Just outputting the right class label was not sufficient for credit, nor was getting anything but close (±0.01) to the correct probability.

  • There were two sets of tests. The small tests were on synthetic data and intended to evoke specific types of errors. For one of them, I used a student’s test that was posted to Moodle; many people used it to validate their submission so I thought you should get credit for it. The large tests were on subsets of the full vote data.

  • No test explicitly pushed at the boundaries of underflow. (Do not take this to mean that the tests in Assignment 10 won’t!) The log transform can lose a small bit of precision, but not enough to matter to the tests we did (within ±0.01 of the correct value). It’s not actually necessary to use it unless underflow occurs—look at estimateProbability() in the solution for an example.

  • We accounted for floating-point roundoff errors around 0.5, for example, democrat,0.5000000000000004 was acceptable when republican,0.5 was the correct answer.

  • As noted in the assignment question and answer section, we accepted two distinct ways of handling unknown values. If your program’s output didn’t match the way that I specified, the autograder re-checked it against the way that Patrick specified. These are the .alternative solutions.

As usual, if you think something is amiss, please email or come see me or Patrick.

Assignment 11 Posted

Assignment 11 has been posted. It is optional.

To complete it, you can re-do a previous assignment or choose a new assignment.

Assignment 06 Graded

The Assignment 06 submissions have been graded. Patrick has posted grades on Moodle.

The test cases referenced in the autograder’s output are available for download here: assignment06-test-cases.tar.gz

Two notes:

  • Because the simulator(s) you wrote were stochastic, the autograder accepted as correct any output within 15% of the correct value. That is, for each possible setting of query values, if |prob_expected - prob_observed| / prob_expected < 0.15, then your simulator passed the given test case. There was a chance some values would be wrong just due to randomness. Patrick watched for this case and adjusted grades upward if he suspected it was occurring. If you think he overlooked such a case, send us both email and we’ll investigate. Generally, if this occurred at all, it occurred on test case 5-j-m.query.
  • According to Patrick, “Many students parsed and/or created json with their own code rather than a library, which caused many mistakes.”

As usual, if you think something is amiss, please email or come see me or Patrick.

Assignment 06 Sample Solution

A sample solution in Java for Assignment 06 is available here: ApproximateInference.tar.gz.

I had the brilliant idea (ha-ha) that I’d write this one up in a largely imperative style, passing and mutating arrays and array indices, in an attempt to get it to run quickly. It does run fairly fast (the Gibbs sampler gets around 2M samples / second on my desktop computer). Part of the cost of using this style is that it took a little longer than I expected to complete; it also is less readable than I’d like.

Exam 2 Graded

Exam 2 has been graded, and the grades are up on Moodle. I’ll have additional comments in class on Wednesday.

A note: Each term exam is weighted equally. This exam is not worth twice as much as Exam 1!

Assignment 05 Sample Solution in Java

At multiple students’ requests, I’ve transliterated the Python solution to Assignment 05 into Java. You can download it here: FJDQuery.tar.gz.

This is a fairly direct translation that takes the same approach as the Python previously posted. The entire dataset is loaded, then filtered by the condition in the query. Then, each line of the output is generated by creating another subset of the data, corresponding to exactly one possible setting of the query variables.

I suggest you eyeball the Cartesian product code in Query.java. The implementation I wrote is recursive. Recursion is a convenient way to think about things, but can cause problems in some languages (like Java) if too many recursive calls are nested. With prior knowledge of the data set we were working with, I knew it wouldn’t be a problem here.

You should understand how this method works, and perhaps take the time to translate it to an iterative form if you’re rusty on the equivalence between recursive and iterative algorithms. If you don’t know how to do that, then dust off your 220 notes, or come see me and we can talk about it.

The solution clocks in at about 150 source lines of code, which is within the typical range of expansion I expect when going from Python to Java (1.5x – 3x). The extra length in Java comes from a few things. For example, Python has less boilerplate and supports very useful syntax sugar to reduce code length (such as list comprehensions).

Python also has a more comprehensive standard library. The input code in Python is a line or two, where Java took about a dozen; similarly, generating the Cartesian product of variable settings is a library call in Python but needs to be written in Java. You can add third-party JARs to Java to help cut down on code length (e.g., OpenCSV to read car.data and Guava for many things, including Cartesian product). I chose not to do that so as to keep this solution self-contained, but you might consider it in future assignments. A drawback to this approach is that you have to understand the third-party API well enough to be sure you’re using it correctly.