# Administrivia

- Exam 1 tonight in HAS 134
- A03 due next Wednesday, 01 October

# Today

- Uncertainty
- Probability
- Syntax and Semantics
- Inference
- Independence and Bayes' Rule

# Uncertainty

Let action A_t = leave for airport t minutes before flight

Will A_t get me there on time?

Problems:

- Partial observability (road state, other drivers' plans, etc.)
- Noisy sensors (traffic reports)
- Uncertainty in action outcomes (flat tire, etc.)
- Immense complexity of modeling and predicting traffic

# Logical approach?

A purely logical approach either

  - Risks falsehood: "A_25 will get me there on time", or
  - Leads to conclusions that are too weak for decision making:

    "A_25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc."

    "A_1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport ..."
    
# Options for handling uncertainty

Default or nonmonotonic logic:

  - Assume my car does not have a flat tire
  - Assume A_25 works unless contradicted by evidence
  - However: What assumptions are reasonable? How to handle contradiction?
    
Rules with fudge factors:

  - A_25 → 0.3 get there on time
  - Sprinkler → 0.99 WetGrass  - WetGrass  → 0.7 Rain  - However: Problems with combination, e.g., Sprinkler causes Rain?

Probability:

  - Model agent's degree of belief
  - Given the available evidence, A_25 will get me there on time with probability 0.04

# Probability

Probabilistic assertions summarize effects of:
  - Laziness — failure to enumerate exceptions, qualifications, etc.  - Ignorance —lack of relevant facts, initial conditions, etc.  - Fundamental stochastic nature of phenomena

# Probability is subjective!

Probabilities relate propositions to agent's own state of knowledge, e.g., P(A_25 | no reported accidents) = 0.06these are not assertions about the world, they are assertions about *belief*Probabilities of propositions *change* with *new evidence*: e.g., P(A_25 | no reported accidents, 5 a.m.) = 0.15

# Making decisions under uncertainty

Suppose I believe the following: 

  - P(A25 gets me there on time | ...) = 0.04
  - P(A90 gets me there on time | ...) = 0.70
  - P(A120 gets me there on time | ...) = 0.95
  - P(A1440 gets me there on time | ...)= 0.9999Which action to choose? Depends on my preferences for missing flight vs. time spent waiting, etc.*Utility theory* is used to represent and infer preferences*Decision theory* = probability theory + utility theory

# Syntax

Basic element: Random variable

  - boolean random variables e.g., Cavity (do I have a cavity?) is one of <true,false>  - discrete random variables e.g., Weather is one of <sunny,rainy,cloudy,snow>  - continuous random variables e.g., Age is in interval [0,120]

Domain values must be exhaustive and mutually exclusive (makes no sense for the domain of a die roll not to include 3!)

Elementary proposition constructed by assignment of a value to a random variable:e.g., Weather = sunny; Cavity = false (abbreviated as ¬cavity or !cavity)

Complex propositions formed from elementary propositions and standard logical connectives e.g., Weather = sunny v Cavity = false

Atomic event is a complete specification of the state of the world about which the agent is uncertain

If the world consists of only two Boolean variables Cavity and Toothache, then there are 4 distinct atomic events:  - Cavity = false ^ Toothache = false 
  - Cavity = false ^ Toothache = true 
  - Cavity = true ^ Toothache = false 
  - Cavity = true ^ Toothache = trueSet of all atomic events must be mutually exclusive and exhaustive (for discrete events; continuous events are slightly more complicated)

Q1. Consider all five-card poker hands in a 52-card deck.

 a. How many atomic events are there? (i.e. how many different 5-card hands)?
 b. What is the probability of each such event?
 c. What is the probability of being dealt a royal straight flush?
 d. Four of a kind?

# Axioms of probability

For any propositions A, B:

 - 0 <= P(A) <= 1 - P(true) = 1 and P(false) = 0 - P(A v B) = P(A) + P(B) - P(A ^ B)

venn diagram for last one: overlapping circles

# Prior probabilities

Prior or unconditional probabilities of propositions, e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidenceProbability distribution gives values for all possible assignments:P(Weather) = <0.72 sunny, 0.1 rainy, 0.08 cloudy, 0.1 snow> (normalized, i.e., sums to 1)Joint probability distribution for a set of random variables givesthe probability of every atomic event on those random variables

P(Weather,Cavity) = a 4 × 2 matrix of values:
    Weather =      sunny rainy cloudy snow 
    Cavity = true  0.144 0.02  0.016  0.02 
    Cavity = false 0.576 0.08  0.064  0.08Every probabilistic question about a domain can in principle be answered by its joint distribution - we'll see this shortly

# Conditional probabilities

Conditional or *posterior* probabilities e.g., P(cavity | toothache=true) = 0.8  - probability of cavity given toothache 
  - **and nothing else**: it's easy but incorrect to say "if toothache then 80% chance of cavity"
 
Notation for complete conditional distributions:P(Cavity | Toothache) = 2-element vector of 2-element vectors

  - can be confusing
      - P(Cavity | Toothache) is the distribution; P(cavity | toothache) is the shorthand for P(cavity=true | toothache=true)
      - will usually be clear from context

If we know more, e.g., cavity is also given, then we have P(cavity | toothache,cavity) = 1

New evidence may be irrelevant, allowing simplification, e.g., P(cavity | toothache, sunny) = P(cavity | toothache) = 0.8

  - This property is called *independence* and is a form of domain knowledge, more later

# CondProb: definitions and rules

P(a | b) = P(a ∧ b) / P(b) if P(b) > 0

alternative formulation:

P(a ∧ b) = P(a | b) P(b) = P(b | a) P(a)

venn diagrams FTW

# CondProb: also holds over distributions

A general version holds for whole distributions, e.g., 

P(Weather,Cavity) = P(Weather | Cavity) P(Cavity)

It's not matrix multiplication, though, it's more like a set of equations

# Chain rule is derived by successive application of product rule:
    P(X1, ...,Xn)    = P(X1,...,Xn-1) P(Xn | X1,...,Xn-1)    = P(X1,...,Xn-2) P(Xn-1 | X1,...,Xn-2) P(Xn | X1,...,Xn-1) 
    = ...    = Product i=1 to n of : P(Xi | X1 , ... ,Xi-1 )

# Inference by Enumeration

Start with the joint probability distribution:

               toothache      !toothache
             catch  !catch   catch  !catch
     cavity  0.108  0.012    0.072  0.008
    !cavity  0.016  0.064    0.144  0.576
    
To evaluate a proposition, sum the atomic events where it's true

For example: P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

Can also compute conditional probabilities:

P(!cavity | toothache) = P(!cavity ^ toothache) / P (toothache)

	(0.016 + 0.064) /
	(0.016 + 0.064 + 0.108 + 0.012)

= 0.4

Q a. P(catch)
Q b. P(Cavity)
Q c. P(Toothache | cavity)
Q d. P(Cavity | toothache v catch)

# Normalization

Same computation for P(cavity | toothache):

	(0.108 + 0.012) /
	(0.016 + 0.064 + 0.108 + 0.012)

= 0.6

The denominator here is the same both times! Intuition: denominator makes distribution for P(Cavity | toothache) add up to one. Sometimes called a normalization constant, alpha.

In other words:

    P(Cavity | toothache) = alpha * P(Cavity,toothache)
    = alpha * [P(Cavity,toothache,catch) + P(Cavity,toothache,!catch)]
    = alpha * [<0.108,0.016> + <0.012,0.064>]
    = alpha * <0.12,0.08> = <0.6, 0.4>
    
General idea: compute distribution on query variable by fixing observed variables and summing over unobserved variables