CMPSCI 691GM : Graphical Models
Spring 2011
Homework #1: Directed Graphical Models

Due Dates:
Tuesday February 1, 2011:   Email working source code
Thursday February 3, 2011: Email report and revised source code

In this homework assignment you will implement and experiment with directed graphical models and write a short report describing your experiences and findings. We will provide you with simulated "medical domain" data based on the "flu" example in class; however, you are welcome  to find and use your own data instead.

Medical Domain Data

We have provided you with a joint probability distribution of symptons, conditions and diseases based on the "flu" example in class. Certain diseases are more likely than others given certain symptons, and a model such as this can be used to help doctors make a diagnosis.  (Don't actually use this for diagnosis, though!). The ground-truth joint probability distribution consists of twelve binary random variables and contains 2^12 possible configurations (numbered 0 to 4095), which is small enough that you can enumerate them exhaustively. The variables are as follows:

You can download all the data here. The archive contains two files:

Core Tasks (for everyone)

  1. Graphical Model: Use your intuition to design a directed graphical model for the twelve variables outlined above.  Implement it in the programming language of your choice.  You could begin your implementation work using simply randomly-assigned parameters. Given these parameters, and an assignment to 12 of the variables, your implementation should be able to return the probability of the full assignment.
  2. Estimating Parameters: Use the dataset (i.e. dataset.dat) to estimate the parameters of your graphical model. You can do this by simply counting and normalizing, i.e. enumerate all the assignments in the dataset, and for each variable v, count the number of times a variable is true for each assignment to its parents, and then normalize the counts using the total number of times the parents had that assignment.
  3. Model Accuracy: Measure the similarity of your model to the true joint probability distribution (i.e., joint.dat). That is, for each assignment, how similar are the probabilities returned by your model to the true probability distribution.  To keep things simple, you can compare the distributions based on their L1-distance. That is, for each assignment ai to all the variables, obtain p(ai) from the true joint distribution ((i+1)th row in joint.dat) and p(ai) using your model. The distance is defined as |p(a0)-p(a0)| + |p(a1)-p(a1)| + ... + |p(a4095)-p(a4095)|.  An alternative distance measure more appropriate to probability distributions is KL-divergence.  If you know what that is, and want to use it, you can evaluate using KL-divergence also.
  4. Querying: Use the graphical model above to answer some queries. A query consists of observed variables (for which we have an assignment), and query variables that over which we want the distribution. The remaining variables need to be marginalized (by summing them out). Since the domain is small you can implement this conditioning and marginalizing process by exhaustively enumerating all assignments (note that only assignments that are consistent with the observed values should be taken into account). Compare the results of these queries on your model to results obtained from using the true joint probability distribution.  Try to think of some interesting queries that will demonstrate causal reasoning, evidential reasoning, and inter-causal reasoning.  To get you started, here are some examples of queries to consider (but also create new ones of your own design):

Further Fun

Although not required, we hope you will be eager to experiment further with your model.  Here are some ideas for additional things to try.  Of course, you may come up with some even more exciting ideas to try on your own, and we encourage that.  Of course, be sure to tell us what you did in your write-up.

What to hand in

The homework should be emailed to 691gm-staff@cs.umass.edu. before 5pm Eastern time on the due date.

Grading

The assignment will be graded on (a) core task completion and correctness, (b) effort and creativity in the optional extras (c) quality and clarity of your written report.

Questions?

Please ask! Send email to 691gm-staff@cs.umass.edu or come to the office hours. If you'd like your classmates to be able to help answer your question, feel free to use 691gm-all@cs.umass.edu.