CMPSCI 383: Artificial Intelligence

Fall 2014 (archived)

Assignment 05 Graded

Update on 2014-11-01 19:30: The grades on Moodle now reflect the correction below regarding door versus doors.

Update on 2014-11-01 10:32: Patrick and I noticed the grades were lower than we expected. In thirteen cases, this may have been due to submissions using door rather than doors. This problem is my fault, as the assignment was posted incorrectly and later corrected. Patrick will be re-grading the assignments affected by this problem, and giving credit if they work correctly when test cases contain door rather than doors. I will update this post again when the new grades are up.

The Assignment 05 submissions have been graded. Patrick has posted grades on Moodle.

The test cases referenced in the autograder’s output are available for download here: assignment05-test-cases.tar.gz

I’ve already received one question about the check for “within 1% of the correct value.” To clarify, your submission received credit for a test case if it correctly enumerated all settings of the query variables, and computed the probability of each setting to within 1% of the correct value. In other words, for each setting, (|prob_expected - prob_observed| / prob_expected) < 0.01.

As usual, if you think something is amiss, please email or come see me or Patrick.

Thursday Office Hours in CS343

My office fits about two people comfortably. On the off chance there are more than that many people in office hours this Thursday, I’ve reserved room 343 (in the CS building) for office hours. It’s just down the hall from 356A. This change only applies this week. The regular location (CS356A) resumes next week.

Assignment 05 Sample Solution

…and an editorial

I’ve provided sample solutions or fairly extensive templates to most of the assignments so far in Java. I’ve also talked about the importance of learning multiple programming languages. Even if you don’t use them in your classes (or your day job, etc.), they’ll help you think about problems and abstractions in new ways that will improve your work in other languages.

In Ye Olden Days, AI at UMass was taught in Common Lisp. For various reasons, we don’t do that anymore, but you can still benefit from learning a language other than Java. I’ve touted Python or Ruby as good second languages for a Java programmer in class. I prefer Python, but there are fine reasons to choose Ruby also (or a Lisp dialect, or another language).

Ruby and Python have similar benefits:

  • each provides nice syntactic sugar around commonly used abstractions like lists and maps (aka associative arrays, dictionaries, hash tables), enabling brevity Java cannot match
  • both can treat functions as first-class citizens, that is, they can be stored in variables and passed around as values, enabling types of abstraction that are difficult to express succinctly in Java
  • like Java, both have well-developed standard libraries and package systems; Python’s standard library is very comprehensive, though Ruby’s package management system is arguably a little saner than Python’s
  • each provides a nice REPL, which supports rapid and interactive development; I’m partial to IPython
  • both are well supported in many IDEs; if you don’t want to dive into Emacs, the makers of my preferred Java IDE also make PyCharm
  • testing in each (both unit testing and spot-checking) is more straightforward than in Java

Now onto the solution

Here’s a sample solution to Assignment 05, written in under 50 source lines of Python3. Whitespace and documentation bump that up to about 100 lines.

(fjdquery.py) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
# Marc Liberatore
# CMPSCI 383 / Fall 2014
# Sample solution to Assignment 05

import csv
import itertools
import sys

CAR_DATA_PATH = 'car.data'
VARIABLE_NAMES = ('buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'car')


def load_data(path):
    """
    :param path:
    :return: a list of dictionaries, one per instance, mapping variable names to values
    """
    with open(path) as f:
        data_reader = csv.DictReader(f, fieldnames=VARIABLE_NAMES)
        return list(data_reader)


def make_values_dict(data):
    """
    Returns the set of possible values associated with each variable in the data.
    :param data: a list of dictionaries, as from load_data
    :return: a dictionary mapping each variable to a set of possible its values
    """
    values = {name: set() for name in VARIABLE_NAMES}
    for instance in data:
        for (variable, vals) in values.items():
            vals.add(instance[variable])
    return values


def load_query(path):
    """
    :param path:
    :return: a pair: the list of query variables, and a dictionary mapping each
             evidence variable to its list of values
    """
    with open(path) as f:
        query_variables = f.readline().split()
        conditions = {}
        for line in f.readlines():
            ls = line.split()
            conditions[ls[0]] = ls[1:]
    return query_variables, conditions


def matches_conditions(instance, conditions={}):
    """
    :param instance: a dictionary mapping each variable to a value
    :param conditions: a dictionary mapping zero or more variables to one
           or more acceptable values
    :return: true iff the instance matches the conditions
    """
    for (variable, values) in conditions.items():
        if instance[variable] not in values:
            return False
    return True


def filter_data(data, conditions={}):
    """
    Returns the subset of the data that matches the given condition(s)
    :param data: a list of dictionaries, as from load_data
    :param conditions: a possibly-empty dictionary of conditions, as from load_query
    :return: a matching subset of the data
    """
    return [i for i in data if matches_conditions(i, conditions)]


def make_query_conditions(query_variables, all_values):
    """
    Returns a list of dictionaries in the format expected by filter_data; each
    dictionary corresponds to one of the possible settings of the query variables.
    :param query_variables: a list of query variables
    :param all_values: a dictionary mapping each variable to its possible
           values as from make_values_dict
    :return: a list of conditions, corresponding to each setting of the query variables
    """
    values_list = [all_values[variable] for variable in query_variables]
    values_product = itertools.product(*values_list)
    return [dict(zip(query_variables, [[v] for v in values]))
            for values in values_product]


def main():
    data = load_data(CAR_DATA_PATH)
    all_values = make_values_dict(data)
    query_variables, conditions = load_query(sys.argv[1])
    conditional_data = filter_data(data, conditions)
    conditional_count = len(conditional_data)
    for query_values in make_query_conditions(query_variables, all_values):
        for variable in query_variables:
            print(query_values[variable][0], end=' ')
        print(len(filter_data(conditional_data, query_values)) / conditional_count)

if __name__ == '__main__':
    main()

I wrote this solution with an emphasis on modularity and testability. I load the entire data file, then filter it based upon the conditions. I then enumerate the possible combinations of settings for the query variables, and count the number of occurrences of each.

As a result this program is perhaps not as efficient as it could be, though each piece is straightforward. I viewed this as a reasonable trade-off, given that the data set contains only a couple of thousand instances. Another reasonable approach would have been to do a single pass through the data, and to accumulate counts (or probabilities) as I went.

Compare this code with your own, regardless of the language you chose. Is it shorter or longer? Are the purposes of the individual methods clear and distinct? What approach did you take?

Assignment 03 Regraded

The Assignment 03 resubmissions have been graded. Patrick replaced the grades on Moodle with the new ones, and added the autograder’s output to the already-present feedback.

As usual, if you think something is amiss, please come see me or Patrick.

Assignment 06 Parsing Example

I’ve written a short example showing how to use Google’s GSON library to parse JSON, as you might do for Assignment 06. You can download the code here: RejectionSampler.tar.gz.

Several people asked for guidance in adding GSON to the classpath, either in Eclipse or on the command line. If you’re interested, read on. Know that some of what I’ve written below is a simplification, and that you should someday read more elsewhere so that you fully understand Java’s class loader semantics.

What is the classpath?

Often, when you tell a compiler (javac Foo.java) to convert your source (.java) files into bytecode (.class) files, and when you tell the Java virtual machine to execute that bytecode (java Foo), your code isn’t standalone. It might depend upon part of the Java Class Library. For example, System is actually java.lang.System, or you might import java.util.ArrayList or the like. How do the compiler and VM know where these files are?

By default, they look in three places. An install of the JDK or JRE places the bootstrap classes, that is the Java Class Library and some other stuff, on your machine. This is always searched when looking for classes. There’s also a system-wide extension directory where an administrator might install additional classes. Finally, your classpath is searched.

The classpath is, by default, the current working directory of the compiler or JVM. Suppose class Foo depends upon class Bar. When you type javac Foo.java in the directory containing Foo.java, javac searches the bootstrap and extension directories first (where, of course, there is no Bar class), then it searches your classpath. Since you haven’t set it, it uses the default value, the current working directory. If Bar.java is in the current directory, it compiles and uses it.

Setting the classpath

Sometimes you want to add other paths to the classpath, or you want to add a .jar to the classpath. To do so, you manually specify it. There’s two ways to do so. One is to set an environment variable named CLASSPATH. Another is to pass an argument to the javac and java commands themselves. This argument is -classpath, and it’s followed by a list of paths to be searched.

When specifying this list, there’s two things to keep in mind. First, once you specify a classpath, the default is no longer used. So if you want to continue searching the current directory as part of the classpath (and you almost certainly do), then you need to remember to add it to the manually specified classpath yourself. The shorthand for “current working directory” in most OSes is . (a single dot).

Second, you need to specify the list in an OS-dependent way. On Windows-based systems, you use \ to separate directories, and ; to separate paths; on Unix-based (including OS X and the Edlab Linux machines), you use / to separate directories and : to separate paths.

A command-line example

To compile and execute the sample code I’ve provided successfully, you need to include the GSON jar on your classpath. You could change directory into the src directory, then, on Windows, you’d use:

1
2
javac -cp .;..\lib\gson-2.3.jar RejectionSampler.java
java -cp .;..\lib\gson-2.3.jar RejectionSampler ..\wetgrass.json

On a Unix system, you’d use:

1
2
javac -cp .:../lib/gson-2.3.jar RejectionSampler.java
java -cp .:../lib/gson-2.3.jar RejectionSampler ../wetgrass.json

Note that .. means “the parent of the current directory.”

The command lines above tell both the compiler and the VM to search both the current working directory, and within the .jar file – they both know to “look inside” .jars for the .class files contained within.

Adding JARs to the classpath in Eclipse

When you first import the files in the archive above, you’ll see something like this:

Note that the import statement shows an error. To fix this, you need to access the properties of the RejectionSampler project. Either left click on it once in the Package Explorer to highlight it, then either select Properties in the File menu, or right-click on it in the Package explorer and select Properties.

Select the Java Build Path option in the list at left, then select the Libraries tab. It should look like this:

Since the jar we want is already in the project (in /lib), click the Add JARs… button. In the window that pops up, select the gson-2.3.jar file as shown here:

Eclipse should then be able to resolve the error and build the project. Further, when you set up a Run Configuration to execute the program from within Eclipse, it will automatically add the jar to the classpath.

Assignment 06 Posted

Assignment 06 has been posted.

I still have some minor updates I’m going to make to the assignment, like a JSON parsing example and directions for adding jars to your classpath. But if you want to start thinking about it now (or you don’t need my help parsing JSON and want to get started) you can do so. Nothing major will change.

Please let me know if you spot any typos, inconsistencies, or ambiguities, and I will correct them.

News for the Week of 13 October

No class Thursday 16 October

Class and office hours are canceled this Thursday. I will unavailable by email this week after Wednesday. If you need an extension for Assignment 04, ask for it before about 1500 on Wednesday, as I can’t guarantee I’ll be checking email right up until 1700.

TA Office Hours Canceled

Patrick’s office hours on Wednesday (1330) are canceled. I will be in my office if you would like to drop by.

Assignment 04

It is still due on 17 October.

The simple backtracking solver I posted last week is capable of handling the 6x6 board embedded in the assignment, as well as the 6x6 sample boards that a student kindly posted to Moodle. You can use it to produce solutions to check your own code against. I posted solutions to the 9x9 and 13x13 boards in a reply to the post on Moodle where the problems were posted.

You can use the solver I posted as a basis for your own work. As you know, it takes far too long to complete on larger boards.

Below is some information on its runtime on some simpler boards (either included in the posted solver tarball, or in the .zip file on Moodle):

board name    open cells  possible assignments  assignments considered  runtime
3x3-simple         4             6561                     10             < 1 sec
4x4-simple         8             4.3E7                   971             < 1 sec
6x6-easy          16             1.85E15               43098             < 1 sec
6x6-hard          16             1.85E15               31045             < 1 sec
6x6-assignment    20             1.21E19              862301               4 sec

I chose to implement forward checking on the AllDifferent constraint, the MRV heuristic for variable assignment choice, and an ad hoc rule to use the “naked pairs” strategy (essentially, additional checking on the AllDifferent constraint). I also wrote code to enumerate the valid possibilities for a cell, given the remaining possibilities in cells that participated in the same constraints. This took me about an hour, and can be done in about 100 lines of changed or added code, based upon the simple solver I provided to you.

Using this approach, inference runs once before the search begins, and as part of each assignment made by the search. This approach significantly reduces the search space and runtime:

                                possible      after      assignments 
  board name    open cells      assignments  inference    considered    runtime
  3x3-simple         4              6561            3           2       < 1 sec
  4x4-simple         8             4.3E7            1           1       < 1 sec
  6x6-easy          16           1.85E15      2097152          32       < 1 sec
  6x6-hard          16           1.85E15       6.00E7         127       < 1 sec
  6x6-assignment    20           1.21E19        20736          17       < 1 sec
  9x9-easy          46           7.85E43      5.24E18        1538         1 sec
  9x9-hard          40           1.47E38      1.62E19         159         1 sec
13x13-easy          88           9.40E83      1.76E43      636434        33 sec
13x13-hard          88           9.40E83      1.38E50    38447279        29 min

Your submission should work about this well for full credit, though you could also try other approaches, e.g., other ad hoc rules on the constraints, arc consistency checking (on either or both the AllDifferent or SumsTo constraint), path consistently checking, parallelizing the search, etc.

Update on 15 October: In fact, at least one student has done better. By performing forward checking on the SumsTo constraint, he is able to solve the 16x16 puzzles in under a second. Well done.