# Administrivia - Exam 1 tonight in HAS 134 - A03 due next Wednesday, 01 October # Today - Uncertainty - Probability - Syntax and Semantics - Inference - Independence and Bayes' Rule # Uncertainty Let action A_t = leave for airport t minutes before flight Will A_t get me there on time? Problems: - Partial observability (road state, other drivers' plans, etc.) - Noisy sensors (traffic reports) - Uncertainty in action outcomes (flat tire, etc.) - Immense complexity of modeling and predicting traffic # Logical approach? A purely logical approach either - Risks falsehood: "A_25 will get me there on time", or - Leads to conclusions that are too weak for decision making: "A_25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc." "A_1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport ..." # Options for handling uncertainty Default or nonmonotonic logic: - Assume my car does not have a flat tire - Assume A_25 works unless contradicted by evidence - However: What assumptions are reasonable? How to handle contradiction? Rules with fudge factors: - A_25 → 0.3 get there on time - Sprinkler → 0.99 WetGrass - WetGrass → 0.7 Rain - However: Problems with combination, e.g., Sprinkler causes Rain? Probability: - Model agent's degree of belief - Given the available evidence, A_25 will get me there on time with probability 0.04 # Probability Probabilistic assertions summarize effects of: - Laziness — failure to enumerate exceptions, qualifications, etc. - Ignorance —lack of relevant facts, initial conditions, etc. - Fundamental stochastic nature of phenomena # Probability is subjective! Probabilities relate propositions to agent's own state of knowledge, e.g., P(A_25 | no reported accidents) = 0.06 these are not assertions about the world, they are assertions about *belief* Probabilities of propositions *change* with *new evidence*: e.g., P(A_25 | no reported accidents, 5 a.m.) = 0.15 # Making decisions under uncertainty Suppose I believe the following: - P(A25 gets me there on time | ...) = 0.04 - P(A90 gets me there on time | ...) = 0.70 - P(A120 gets me there on time | ...) = 0.95 - P(A1440 gets me there on time | ...)= 0.9999 Which action to choose? Depends on my preferences for missing flight vs. time spent waiting, etc. *Utility theory* is used to represent and infer preferences *Decision theory* = probability theory + utility theory # Syntax Basic element: Random variable - boolean random variables e.g., Cavity (do I have a cavity?) is one of - discrete random variables e.g., Weather is one of - continuous random variables e.g., Age is in interval [0,120] Domain values must be exhaustive and mutually exclusive (makes no sense for the domain of a die roll not to include 3!) Elementary proposition constructed by assignment of a value to a random variable: e.g., Weather = sunny; Cavity = false (abbreviated as ¬cavity or !cavity) Complex propositions formed from elementary propositions and standard logical connectives e.g., Weather = sunny v Cavity = false Atomic event is a complete specification of the state of the world about which the agent is uncertain If the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events: - Cavity = false ^ Toothache = false - Cavity = false ^ Toothache = true - Cavity = true ^ Toothache = false - Cavity = true ^ Toothache = true Set of all atomic events must be mutually exclusive and exhaustive (for discrete events; continuous events are slightly more complicated) Q1. Consider all five-card poker hands in a 52-card deck. a. How many atomic events are there? (i.e. how many different 5-card hands)? b. What is the probability of each such event? c. What is the probability of being dealt a royal straight flush? d. Four of a kind? # Axioms of probability For any propositions A, B: - 0 <= P(A) <= 1 - P(true) = 1 and P(false) = 0 - P(A v B) = P(A) + P(B) - P(A ^ B) venn diagram for last one: overlapping circles # Prior probabilities Prior or unconditional probabilities of propositions, e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(Weather) = <0.72 sunny, 0.1 rainy, 0.08 cloudy, 0.1 snow> (normalized, i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables P(Weather,Cavity) = a 4 × 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 Every probabilistic question about a domain can in principle be answered by its joint distribution - we'll see this shortly # Conditional probabilities Conditional or *posterior* probabilities e.g., P(cavity | toothache=true) = 0.8 - probability of cavity given toothache - **and nothing else**: it's easy but incorrect to say "if toothache then 80% chance of cavity" Notation for complete conditional distributions: P(Cavity | Toothache) = 2-element vector of 2-element vectors - can be confusing - P(Cavity | Toothache) is the distribution; P(cavity | toothache) is the shorthand for P(cavity=true | toothache=true) - will usually be clear from context If we know more, e.g., cavity is also given, then we have P(cavity | toothache,cavity) = 1 New evidence may be irrelevant, allowing simplification, e.g., P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8 - This property is called *independence* and is a form of domain knowledge, more later # CondProb: definitions and rules P(a | b) = P(a ∧ b) / P(b) if P(b) > 0 alternative formulation: P(a ∧ b) = P(a | b) P(b) = P(b | a) P(a) venn diagrams FTW # CondProb: also holds over distributions A general version holds for whole distributions, e.g., P(Weather,Cavity) = P(Weather | Cavity) P(Cavity) It's not matrix multiplication, though, it's more like a set of equations # Chain rule is derived by successive application of product rule: P(X1, ...,Xn) = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1) = P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1) = ... = Product i=1 to n of : P(Xi | X1 , ... ,Xi-1 ) # Inference by Enumeration Start with the joint probability distribution: toothache !toothache catch !catch catch !catch cavity 0.108 0.012 0.072 0.008 !cavity 0.016 0.064 0.144 0.576 To evaluate a proposition, sum the atomic events where it's true For example: P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 Can also compute conditional probabilities: P(!cavity | toothache) = P(!cavity ^ toothache) / P (toothache) (0.016 + 0.064) / (0.016 + 0.064 + 0.108 + 0.012) = 0.4 Q a. P(catch) Q b. P(Cavity) Q c. P(Toothache | cavity) Q d. P(Cavity | toothache v catch) # Normalization Same computation for P(cavity | toothache): (0.108 + 0.012) / (0.016 + 0.064 + 0.108 + 0.012) = 0.6 The denominator here is the same both times! Intuition: denominator makes distribution for P(Cavity | toothache) add up to one. Sometimes called a normalization constant, alpha. In other words: P(Cavity | toothache) = alpha * P(Cavity,toothache) = alpha * [P(Cavity,toothache,catch) + P(Cavity,toothache,!catch)] = alpha * [<0.108,0.016> + <0.012,0.064>] = alpha * <0.12,0.08> = <0.6, 0.4> General idea: compute distribution on query variable by fixing observed variables and summing over unobserved variables