« Bayesian Methods for Text :: Contents :: Beta–binomial unigram language model »

Probabilistic modeling

Bayesian modeling

Explanation: use model to explain observed data; In particular, to learn latent random variables to explain data
Exploration: examine possible value of unknown random variables; draw samples
Prediction

Explanation

Choose model structure
Define data generating process

Specify the follows:
- random variables: observed random variables and unobserved random variables
- probability distribution
- independence
- any other relevant assumptions; hyper-parameters
Form posterior to explain the data

learning: point estimation
inference: represent uncertainty using distributions

NOTE: Bayesian maintains uncertainty at all time. MAP estimation is not Bayesian.

Notation

$\Psi$ : all unknown random variables of the model
$\D$ : observed data
$\H$ : modeling assumption
$\D'$ : new data

Bayes rule

$P(\Psi|\D,\H) = \frac{P(\D|\Psi,\H)P(\Psi|\H)}{P(\D|\H)}$

$P(\Psi|\D,\H)$ : posterior distribution over $\Psi$ given $\D$ and $\H$
$P(\D|\Psi,\H)$ : likelihood of $\Psi$ . WRONG wording: likelihood of data
$P(\Psi|\H)$ : prior distribution over $\Psi$
$P(\D|\H)$ : evidence of data $\D$

$P(\D|\H) = \int d\Psi P(\D|\Psi,\H)P(\Psi|\H)$

where $\int d\Psi$ is multi-variate integral (over all unknown random variables)
for $\psi\in\Psi$ , we have

$P(\psi|\D,\H) = \int d\Psi_{\setminus \psi} P(\Psi|\D,\H)$

Exploration

Draw samples from posterior: typical values of a distribution
Common statistics: mean, mode

Mean is more representative than mode for the entire distribution

Prediction

Prediction with learned parameters: $P(\D'|\widehat{\Psi},\H)$
Inference approach: derive predictive distribution $P(\D'|\D,\H)$

$P(\D'|\D,\H) = \int d\Psi P(\D'|\Psi,\H) P(\Psi|\D,\H)$
- step 1: compute posterior $P(\Psi|\D,\H)$
- step 2: compute integral (may be intractible)

« Bayesian Methods for Text :: Contents :: Beta–binomial unigram language model »