I’ve provided sample solutions or fairly extensive templates to most of the assignments so far in Java. I’ve also talked about the importance of learning multiple programming languages. Even if you don’t use them in your classes (or your day job, etc.), they’ll help you think about problems and abstractions in new ways that will improve your work in other languages.
In Ye Olden Days, AI at UMass was taught in Common Lisp. For various reasons, we don’t do that anymore, but you can still benefit from learning a language other than Java. I’ve touted Python or Ruby as good second languages for a Java programmer in class. I prefer Python, but there are fine reasons to choose Ruby also (or a Lisp dialect, or another language).
Ruby and Python have similar benefits:
each provides nice syntactic sugar around commonly used abstractions like lists and maps (aka associative arrays, dictionaries, hash tables), enabling brevity Java cannot match
both can treat functions as first-class citizens, that is, they can be stored in variables and passed around as values, enabling types of abstraction that are difficult to express succinctly in Java
like Java, both have well-developed standard libraries and package systems; Python’s standard library is very comprehensive, though Ruby’s package management system is arguably a little saner than Python’s
each provides a nice REPL, which supports rapid and interactive development; I’m partial to IPython
both are well supported in many IDEs; if you don’t want to dive into Emacs, the makers of my preferred Java IDE also make PyCharm
testing in each (both unit testing and spot-checking) is more straightforward than in Java
Now onto the solution
Here’s a sample solution to Assignment 05, written in under 50 source lines of Python3. Whitespace and documentation bump that up to about 100 lines.
# Marc Liberatore# CMPSCI 383 / Fall 2014# Sample solution to Assignment 05importcsvimportitertoolsimportsysCAR_DATA_PATH='car.data'VARIABLE_NAMES=('buying','maint','doors','persons','lug_boot','safety','car')defload_data(path):""" :param path: :return: a list of dictionaries, one per instance, mapping variable names to values """withopen(path)asf:data_reader=csv.DictReader(f,fieldnames=VARIABLE_NAMES)returnlist(data_reader)defmake_values_dict(data):""" Returns the set of possible values associated with each variable in the data. :param data: a list of dictionaries, as from load_data :return: a dictionary mapping each variable to a set of possible its values """values={name:set()fornameinVARIABLE_NAMES}forinstanceindata:for(variable,vals)invalues.items():vals.add(instance[variable])returnvaluesdefload_query(path):""" :param path: :return: a pair: the list of query variables, and a dictionary mapping each evidence variable to its list of values """withopen(path)asf:query_variables=f.readline().split()conditions={}forlineinf.readlines():ls=line.split()conditions[ls[0]]=ls[1:]returnquery_variables,conditionsdefmatches_conditions(instance,conditions={}):""" :param instance: a dictionary mapping each variable to a value :param conditions: a dictionary mapping zero or more variables to one or more acceptable values :return: true iff the instance matches the conditions """for(variable,values)inconditions.items():ifinstance[variable]notinvalues:returnFalsereturnTruedeffilter_data(data,conditions={}):""" Returns the subset of the data that matches the given condition(s) :param data: a list of dictionaries, as from load_data :param conditions: a possibly-empty dictionary of conditions, as from load_query :return: a matching subset of the data """return[iforiindataifmatches_conditions(i,conditions)]defmake_query_conditions(query_variables,all_values):""" Returns a list of dictionaries in the format expected by filter_data; each dictionary corresponds to one of the possible settings of the query variables. :param query_variables: a list of query variables :param all_values: a dictionary mapping each variable to its possible values as from make_values_dict :return: a list of conditions, corresponding to each setting of the query variables """values_list=[all_values[variable]forvariableinquery_variables]values_product=itertools.product(*values_list)return[dict(zip(query_variables,[[v]forvinvalues]))forvaluesinvalues_product]defmain():data=load_data(CAR_DATA_PATH)all_values=make_values_dict(data)query_variables,conditions=load_query(sys.argv[1])conditional_data=filter_data(data,conditions)conditional_count=len(conditional_data)forquery_valuesinmake_query_conditions(query_variables,all_values):forvariableinquery_variables:print(query_values[variable][0],end=' ')print(len(filter_data(conditional_data,query_values))/conditional_count)if__name__=='__main__':main()
I wrote this solution with an emphasis on modularity and testability. I load the entire data file, then filter it based upon the conditions. I then enumerate the possible combinations of settings for the query variables, and count the number of occurrences of each.
As a result this program is perhaps not as efficient as it could be, though each piece is straightforward. I viewed this as a reasonable trade-off, given that the data set contains only a couple of thousand instances. Another reasonable approach would have been to do a single pass through the data, and to accumulate counts (or probabilities) as I went.
Compare this code with your own, regardless of the language you chose. Is it shorter or longer? Are the purposes of the individual methods clear and distinct? What approach did you take?