# Two notes First: Law of total probability: P(A) != P(A|B) + P(A|!B) P(A) = P(A ^ B) + P(A ^ !B) P(A ^ B) = P(A|B)P(B), etc. (used for Bayesian inference) Second: Draw / | \ / | \ X1 X2 X3 CPT for Draw: - P(Draw = a) = 1/3 - P(Draw = b) = 1/3 - P(Draw = c) = 1/3 CPT for each of X1, X2, X3 (identical): Draw P(X1 = heads) a .2 b .6 c .8 P(Draw | X1 = heads, X2 = heads, X3 = tails) = P(H, H, T | Draw) P(Draw) / P(H, H, T) 1/P(H, H, T) is alpha; P(Draw) is the same for each draw; all that matters is P(H, H, T | Draw) ; for each value of Draw. Apply conditional independence: P(X1 = H | Draw) P(X2 = H | Draw) P(X3 = T | Draw) and compute. Alternatively, P(a, H, H, T) = P(X1=H|a) P(X=H|a) P(X3=T|a) P(a) = .2 * .2 * .8 * 1/3 ~ 1 * 1 * 4 ~ 4 P(b, H, H, T) = P(X1=H|b) P(X=H|b) P(X3=T|b) P(a) = .6 * .6 * .4 * 1/3 ~ 3 * 3 * 2 ~ 18 P(c, H, H, T) = P(X1=H|c) P(X=H|c) P(X3=T|c) P(a) = .8 * .8 * .2 * 1/3 ~ 4 * 4 * 1 ~ 16 # Constructing Bayes Nets Best is to use domain knowledge. But if not available (and conditional probabilities are) then one method is to do the following. We will explore others later in the semester. Choose an ordering of variables X1, … ,Xn For i = 1 to n: - add Xi to the network - select parents from X1, … ,Xi-1 such that P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1) This choice of parents guarantees: P (X1, … ,Xn) = π P (Xi | X1, … , Xi-1) (chain rule) = π P (Xi | Parents(Xi)) (by constr.) (see ppt for example) # Independence (ppt illustration) A node X is conditionally independent of its non-descendants given its parents A node X is conditionally independent of *all other* nodes, given its Markov blanket (parents, children, children's parents). # Conditional independence, revisited A -> M | v J Q. Are JohnCalls and MaryCalls independent? - No, they are not completely independent - Whether they are independent is conditional on the value of Alarm Q. If the value of Alarm is known, are JohnCalls and MaryCalls independent? - Yes, for each known value of A, J and M are independent B | v A <- E Q. Are Burglary and Earthquake cond. independent? - Yes, nodes are conditionally independent of their non-descendants given their parents Q. Are they completely independent? - No, one can "explain away" the other if value of Alarm is known # Types of Inference Simple queries - Compute posterior marginal P(Xi|E=e) - P(NoGas|Gauge=empty, Lights=on, Starts=false) Conjunctive queries - P(Xi,Xj|E=e) = P(Xi|E=e) P(Xj|Xi,E=e) Optimal decisions - Need utility information, but also need P(outcome | action, evidence) Value of information — “What info do I need now?” Sensitivity analysis — “Which values matter most?” Explanation — “Why do I need a new starter?” # Inference in Bayes Nets Recall: B E A J M Evidence variables, the things you typically observe: e.g., J or M Query variables, what you want to know: e.g., B Hidden variables, neither observable nor what you want to know (but affect outcomes and thus are usually worth modeling): e.g., A, E # Simple inferences P(B) B is our query variable. P(A|b,e) A is our query variable, B, E are evidence variables P(J,M|a) J, M are query variables, A is evidence variable # More difficult P(B|j,m) ? Recall that P(X|e) = alpha P(X,e) = alpha * sum_y P(X, e, y) How to find P(X, e, y)? Product of conditional probabilities in the network. A query can be answers by computing the sums of products of conditional products from the network. P(B|j,m) = alpha * P(B, j, m) = alpha sum_e sum_a P(B, j, m, e, a) For Burglary=true: P(b|j,m) = alpha sum_e sum_a P(b)P(e)P(a|b,e)P(j|a)P(m|a) # Simplifying Can move P(b) and P(e) outside some sums: alpha sum_e sum_a P(b)P(e)P(a|b,e)P(j|a)P(m|a) = alpha P(b) sum_e P(e) sum_a P(a|b,e)P(j|a)P(m|a) Better, but still some duplicates in enumeration tree. P(b) once P(E) twice, once for each value in summation P(A|b,E) each split again etc. Problems: - Repeated computations - Irrelevant variables # Solution: variable elimination Don't repeat computations, ie., variable elimination. Treat each P value in alpha P(b) sum_e P(e) sum_a P(a|b,e)P(j|a)P(m|a) as a factor, and compute separately bottom up (i.e. right-to-left) (Details in text) # Solution: ignore irrelevant variables Imagine a query P(J | b): P(j|b) = alpha * P (j , b) = alpha * sum_e,a,m P(b)P(e)P(a|b,e)P(j|a)P(m|a) = alpha P(b) sum_e P(e) sum_a P(a|b,e), P(J|a) sum_m P(m|a) But note, for example, sum_m P(m|a) = 1 by definition! It's irrelevant. Generally, can remove recursively any *leaf* node that is also not a query variable or evidence variable. # Not necessarily easy in practice see ppt for examples # Complexity Singly connected networks or polytrees - At most one undirected path between any two nodes in the network - Time and space complexity in linear in n Multiply connected networks - Time and space complexity is exponential even when the number of parents per nodes is bounded - Consider — Special case of Bayesian network inference is inference in propositional logic. Inference is at least as hard as finding the number of satisfying assignments (#P-hard) Thus, may want to consider lower-complexity methods for inference that are approximate. Next week!