Assignment 06

This assignment is due at 1700 on Wednesday, 05 November.

The goal of this assignment is to write one or two programs to perform approximate inference on a Bayesian network, and to evaluate their performance. The first program will implement rejection sampling; the second will implement an Gibbs sampler (a form of Markov chain Monte Carlo approximation).

I will be updating the assignment with questions (and their answers) as they are asked.

Problem

We have learned that given a Bayes net and a query, we can compute the exact distribution of the query variable. But, we’ve also learned that this is only generally feasible in Bayes nets that are singly connected. For large, multiply connected Bayes nets, we can instead use approximate inference techniques.

One such technique is rejection sampling. The Bayes net is treated as a generative model, and samples are generated. Samples that are inconsistent with the evidence are discarded. The final set of samples is used to empirically estimate the underlying distribution or probability.

Another technique is a Markov chain Monte Carlo technique called Gibbs sampling. We covered this in class, and it is described in detail in the text in Section 14.5.2. Your Gibbs sampler should iterate through the list of non-evidence variables, sampling each in turn.

The key step in Gibbs sampling is sampling a variable Xi at each step. To do so, we need to compute its conditional probability distribution, given its Markov blanket:

$$P(X_i | \mathrm{MarkovBlanket}(X_i) = \alpha P(X_i | \mathrm{Parents}(X_i)) \times \prod_{Y_j \in \mathrm{Children}(X_i)} P(y_j | \mathrm{Parents}(Y_j))$$

Some observations and notes on this equation:

There’s a normalizing constant, α. For a binary variable X, you’ll need to compute P(X=true) and P(X=false), then normalize.
When computing the probability of X given its Markov blanket, it might not be clear what value to use for each variable in the Markov blanket of X. The answer: its most recently chosen value. Remember, Gibbs samplers are based on Markov chains: they remember the chosen value of variables after each sample occurs.
Finally, a parent of Yj is Xi. For a binary variable Xi, when computing P(Xi=true), set its value to true; when computing P(Xi=false), set its value to false.

As noted in class and in text, MCMC performs best after a burn-in period. In other words, it should be allowed to iterate for a while and converge on (or at least, move toward) its stationary distribution. To support this, your Gibbs sampler should have a burn-in period of at least 100 samples. That is, throw at least the first 100 samples away: do not consider them when estimating probabilities, and do not count them toward the number of samples taken.

In both rejection and Gibbs sampling, the product is a sequence of samples, each corresponding to an atomic event. This sequence can then be used to estimate the distribution you’re interested in, like what you did in Assignment 05.

Input data format

In this problem, you will work with Bayes nets that consist of boolean random variables, and queries that involved these nets.

Bayes network format

The Bayes net will be provided as a JSON-encoded data structure. The structure will consist of an array of nodes. Each node is an array, and will contain in order:

its name
an array of parents names (which may be of length 0)
a conditional probability table, represented compactly as an array. Each entry corresponds to the conditional probability that the variable corresponding to this node is true. The rows are ordered such that the values of the node’s parent variable(s) are enumerated in the traditional way. That is, in a table, the rightmost variable alternates T, F, T, F, …; the variable to its left T, T, F, F, T, T, F, F, …; and so on.

For example, the WetGrass node from Figure 14.12 in the text would be represented as:

["WetGrass", ["Sprinkler", "Rain"], [0.99, 0.9, 0.9, 0.0]]

The array of nodes will be ordered such that the parents of any given node are specified before the node itself.

The Cloudy/Sprinkler/Rain/WetGrass example from class and from Figure 14.12 in the text could be represented as follows:

[["Cloudy", [], [0.5]],
 ["Sprinkler", ["Cloudy"], [0.1, 0.5]],
 ["Rain", ["Cloudy"], [0.8, 0.2]],
 ["WetGrass", ["Sprinkler", "Rain"], [0.99, 0.9, 0.9, 0.0]]]

I have posted a sample parser (and discussion of it) based upon GSON. If you are using Java, you can assume this library will be on your program’s CLASSPATH when we grade.

Query format

A query will ask you to compute a possibly conditional probability of a set of variables, such as P(WetGrass, Rain) or P(Rain | Cloudy = false, Sprinkler = true). Queries will always be for a distribution, not a specific event’s probability.

Queries will be in JSON format, represented as an array, with the following three elements:

One or more query variables, in an array
Zero or more evidence variables, in an array
The values of the evidence variables, in an array of the same length as the evidence variable array

The first of the above two queries would be represented as:

[["Rain", "WetGrass"], [], []]

and the second as:

[["Rain"], ["Cloudy", "Sprinkler"], [false, true]]

Output data format

The output of your sampler will be an estimated distribution over the n query variables. Write this output as a JSON-formatted array with 2ⁿ entries, in the traditional order described in Bayes network format above.

For the first query above, there would be four entries, corresponding to, in order, the probabilities of:

Rain   WetGrass
 T        T
 T        F
 F        T
 F        F

If your sampler found them equiprobable, the output would be:

[0.25, 0.25, 0.25, 0.25]

For the second query, if your sampler found rain and ¬rain equiprobable, it would output

[0.5, 0.5]

A special case: A sample in rejection sampling is one sampling of the network, regardless of whether it is rejected. That is, when counting the number of samples taken, include rejected samples. If every sample is rejected, you’ll have no data upon which to estimate probabilities. If this happens, output an empty array instead of an array of probabilities:

[]

In any case, the JSON output above should be written to standard output (for example, using System.out.println()).

Additionally, your sampler should output its sampling rate, in terms of samples per second, to standard error (for example, using System.err.println()). You can check the time before and after your sampler concludes using System.nanoTime() to compute this value. It should be written with no decoration, for example:

25643.0

What to submit

You should submit two, or possibly three, things. A rejection sampler, a Gibbs sampler (optional, but see Grading), and a readme.txt.

Your samplers should use their first command line argument as the path to file describing a Bayes net, their second command line argument as the path to a file describing a query, and their third command line argument as the number of samples to generate. If, for example, your sampler’s main method is in a Java class named RejectionSampler, we should be able to use java RejectionSampler /Users/liberato/bayesnet.json /Users/liberato/query.json 1000 to direct your program to read the query in the file located at /Users/liberato/query.json and the Bayes net at /Users/liberato/bayesnet.json, and to generate 1,000 samples when estimating probabilities.

We will put gson-2.3.jar on the classpath during compilation and execution of your program (if it is written in Java).
Your program should print the computed distribution to standard output, in exactly the format described above.

Submit the source code of your programs, written in the language of your choice. Name the files containing the main() methods RejectionSampler.java and GibbsSampler.java or your language’s equivalent. If the file(s) you submit depend(s) upon other files, be sure to submit these other files as well.

As in the previous assignment, while you may use library calls for parsing, data structures and the like, you must implement the sampler you use yourself. Do not use a library for either sampling or to represent the Bayes net. We will consider it plagiarism if you do. Check with us if you think there’s any ambiguity.

Your readme.txt should contain the following items:

your name
if the language of your choice is not Java, Python, Ruby, node.js-compatible JavaScript, ANSI C or C++ (or if you’re concerned it’s not completely obvious to us how to compile and execute it), a description of how to compile and execute the submitted files
a description of what you got working, what is partially working and what is completely broken

If you’re using language features that require a specific version of your language or runtime, check for that version at program start and fail if it’s not present, emitting an understandable error message indicating this fact. Your program must compile and execute on the Edlab Linux machines.

If your program does not compile or execute, you will receive no credit. Check with us in advance if you’re concerned.

Grading

We will run your program on a variety of test cases. The exact test cases will not be available to you before grading. You are welcome to write and distribute your own test cases.

You will earn up to 80% of the possible points on this assignment by correctly implementing a rejection sampler. The remaining 20% of points will be earned if you correctly implement a Gibbs sampler.

If your readme.txt is missing or judged insufficient, your overall score may be penalized by up to ten percent.

We’re not going to feed your program incorrectly formatted input, so you need only concern yourself with handling input in the format described in the assignment.

We expect valid output. Generating output that is not in the format described in the assignment will result in a failed test case. We will check that your output values are reasonably close to the correct values after a large number of iterations.

I do not expect anything in a solution to this assignment to be particularly memory-intensive. But as usual, if your program exceeds available heap memory (which we’ll set to 1 GB in Java, using the -Xmx1024M argument if necessary), or if it does not terminate in twenty seconds with reasonably-sized sample count (for example, 1,000) on a reasonably-size Bayes net (for example, thirty nodes), we will consider the test case failed.

Notes for nerds*

It’s hard to know how long of a burn-in period is appropriate for a MCMC-based approximation technique: it depends upon the underlying Bayes net, initial choice of variable settings, evidence variables and query variables. If your Gibbs sampler is relatively fast, try increasing the length of the burn-in period to potentially get it closer to the stationary distribution, and thereby reduce the error in its estimates.

Stochastic simulation techniques rely on having a reasonably good pseudorandom number generator. Java’s builtin Random suffices for this assignment, but probably isn’t something you’d use in practice if the quality of the random numbers matters. It’s based upon a linear congruential generator, which is known to be flawed in various ways. Instead you’d use something like the Mersenne Twister, or, for secure random numbers, a hardware source. The latter is overkill for stochastic simulations.

Speaking of pseudorandom numbers, it’s not a bad idea to instantiate a single, seeded instance of Random and use it everywhere in your program, at least when testing. If you use a fixed seed, it will always generate the same sequence of random numbers. This can help with debugging as it makes your program’s operation deterministic (and thus, repeatable).

Floating-point underflow. It’s a thing. If you test on networks where probabilities get small in magnitude, particularly in the large product in the Gibbs sampler, you may run into this problem. It will manifest as probabilities going to zero when they should not (for example, when all terms in the product are non-zero).

There are several ways to mitigate underflow in a Java-based Gibbs sampler. One (slow) one is to use arbitrary-precision decimal numbers, that is, the BigDecimal class. Another option is to multiply each term by the same large factor, though picking this factor a priori is difficult. As long as the factor is the same for each value in the distribution, it will be automagically removed when you normalize. Another is to perform calculations under a log transform, and reverse the transform after normalizing.

*In case it is not obvious, I mean “nerds” affectionately, since of course I am one. As an instructor, I love it when you want to dig deeper into problems and ask interesting questions.

Questions and answers

I have been able to compile assignment 06 on the command line along with the statement javac -cp .:../lib/gson-2.3.jar RejectionSampler.java, but when I try to run the sampler on the command line, I get a NoClassDefFoundError for com/google/gson/JsonElement. Have I missed an intermediary step in the process of setting the classpath?

Maybe. Try adding the jar to the classpath of the java command as well, for example:

java -cp .:../lib/gson-2.3.jar RejectionSampler ../wetgrass.json 1000

I’ve updated the example to include this line.

I have a question about output ordering. so, for query variables like: [“Rain”, “WetGrass”, “Sprinkler”] the correct order for the output would be:
Rain WetGrass  Sprinkler
[true, true, true]
[true, true, false]
[true, false, true]
[true, false, false]
[false, true, true]
[false, true, false]
[false, false, true]
[false, false, false]
is this correct?

Yes, it looks right to me.

If the input is always in order, Do you think it is okay to use TreeMap instead of HashMap? Although HashMap is more efficient, I think it will be easier for me to use TreeMap, because I’m thinking about sampling from top of the BN(tree), since “Each variable is sampled according to the conditional distribution given the values already sampled for the variable’s parents.” To me, this quote from book means you have to first sample a parent in order to sample its children.

I ask this because when I iterate over BN map, the order is not the same order in which your code store each node into the map.

You are welcome to do whatever you like with the code I provided; it’s just to get you started. Depending upon your approach some sort of ordered data structure (like TreeMap) might make your code more straightforward. A List may suffice for the rejection sampler. It’s entirely up to you.

A question related to sampling an event. My approach is:

Start from the very first node (“great…grand father” of every other nodes, I assume this node is the first element in BN JSON, please let me know if this is not always the case), since this node does not have parent, generate its value according to it’s probability, in case of Cloudy, 50% chance the value will be true(achieve this by Mersenne Twister like you said in your post), after I sample this node, go a head sample its children.

Or, should I start sampling from those nodes who don’t have parents? Because there can be many nodes who don’t have parents, right? I’m a little confused about where to start sampling in the Bayes Net.

Treat the nodes as a partially-ordered set, and put them in an order such that each parent is before each of its children. Not coincidentally, the input comes in such an order.

Then, sample the value of each in that order. When a node is conditioned on its parents, use the current sample value of each parent (that is, the value you have already sampled this iteration, given the partial ordering), to determine which entry in your CPT to use when sampling it.

If you enter in 1000 samples for the rejection sampling, does that mean you want 1000 generated and some number under that is used, or do you want 1000 samples used?

The former. To quote the assignment, “A sample in rejection sampling is one sampling of the network, regardless of whether it is rejected.”

How do I pick the outcome of a weighted coin flip?

Use Java’s java.util.Random class’s nextDouble() method to get a value chosen uniformly at random on the interval [0.0, 1.0), then check to see if it’s below your cutoff. For example:

// random is an instance variable in this example
Random random = new Random();

boolean weightedBinaryChoice(double probabilityTrue) {
    return (random.nextDouble() <= probabilityTrue);
}

If you had P(x=true) = 0.3, you’d call weightedBinaryChoice(0.3) to get the results of your weighted coin flip.

I read through the questions and answers and noticed the one about a weighted coin flip. I thought samples could be generated randomly or use some weights as the question (I believe) is asking. Is random alright to use? (I coded using random but changing it shouldn’t be hard, finding a function to match Java would be the most of that time)

I assume from the next question that you’re asking about python’s random package. If so, then yes, random.random() is fine.

And then I’ve never attempted print to stderr in Python nor have I ever read from it. Should only a single float be printed to stderr? I’m not sure if Python appends anything to the strings beforehand and if that will throw off the autograder.

The easiest approach here is probably to use sys.stderr.write(str(samples_per_second) + ‘\n’) or the equivalent. That will produce the desired effect.

Is the sample rate computed for all samples generated or just the ones rejected over time?

(For the latter, I think you mean the ones kept, not the ones rejected.) The former. But either is fine. Both are interesting for various reasons — originally there was going to be a little more to the assignment involving relative sampling rates, but I decided to shorten the assignment somewhat and so left it out.

My program output is :
[0.8004351265822784, 0.19956487341772153]
94339.6
is my sampling rate at the right place?

is sampling rate ( # of sample you ask to generate / time) or (# of samples >left after rejections / time)

is decimal digit of my output okay? or do you want us to round it to some > point?

Yes, that’s certainly high enough. It’s not an explicit part of the grade, though if it were very slow it would help indicate to Patrick or me that something was amiss in your implementation.
Both are interesting. The assignment technically asks for the former, but we’re not going to check that this value achieves some cutoff.
Yes, this output is OK (assuming the samples per second are going to standard error, not standard output). No need to round.

CMPSCI 383: Artificial Intelligence

Fall 2014 (archived)