Bayesian Methods for Text

Probabilistic modeling

«  Bayesian Methods for Text   ::   Contents   ::   Beta–binomial unigram language model  »

Probabilistic modeling

Bayesian modeling

  • Explanation: use model to explain observed data; In particular, to learn latent random variables to explain data
  • Exploration: examine possible value of unknown random variables; draw samples
  • Prediction

Explanation

  1. Choose model structure

  2. Define data generating process

    Specify the follows:

    • random variables: observed random variables and unobserved random variables
    • probability distribution
    • independence
    • any other relevant assumptions; hyper-parameters
  3. Form posterior to explain the data

  • learning: point estimation
  • inference: represent uncertainty using distributions

NOTE: Bayesian maintains uncertainty at all time. MAP estimation is not Bayesian.

Notation

  • \Psi: all unknown random variables of the model
  • \D: observed data
  • \H: modeling assumption
  • \D': new data

Bayes rule

P(\Psi|\D,\H) = \frac{P(\D|\Psi,\H)P(\Psi|\H)}{P(\D|\H)}

  • P(\Psi|\D,\H): posterior distribution over \Psi given \D and \H

  • P(\D|\Psi,\H): likelihood of \Psi. WRONG wording: likelihood of data

  • P(\Psi|\H): prior distribution over \Psi

  • P(\D|\H): evidence of data \D

    P(\D|\H) = \int d\Psi P(\D|\Psi,\H)P(\Psi|\H)

    where \int d\Psi is multi-variate integral (over all unknown random variables)

  • for \psi\in\Psi, we have

    P(\psi|\D,\H) = \int d\Psi_{\setminus \psi} P(\Psi|\D,\H)

Exploration

  • Draw samples from posterior: typical values of a distribution
  • Common statistics: mean, mode

Mean is more representative than mode for the entire distribution

Prediction

  • Prediction with learned parameters: P(\D'|\widehat{\Psi},\H)

  • Inference approach: derive predictive distribution P(\D'|\D,\H)

    P(\D'|\D,\H) = \int d\Psi P(\D'|\Psi,\H) P(\Psi|\D,\H)

    • step 1: compute posterior P(\Psi|\D,\H)
    • step 2: compute integral (may be intractible)

«  Bayesian Methods for Text   ::   Contents   ::   Beta–binomial unigram language model  »