Beta–binomial unigram language model
Recap
Task
- specify model structure
- specify probability distribution
- specify independence assumption
- specify other modeling assumption
Goal
- explanation: form posterior of latent variable to explain data
- prediction: form probability of unseen data
Notation
- : all unknown random variables of the model
- : observed data
- : modeling assumption
- : new data
Text model
Corpus of document consists of word tokens which are characterized by , the probability of type “no” (Bernoulli distributed).
NOTE: no document boundaries are considered
,
- Independence assumption: for all
- Exchangeable: bag-of-word model. eg:
where .
Specify prior : specify the degree of belief of the value of on the entire real line.
Hyperparameters of Beta distribution
- : concentration parameter – how concentrated the samples are to the mean
- : mean of the beta distribution, ie,
Properties of Gamma function
Mean of Beta distribution
Beta–binomial model
Generating process
- for from 1 to N
Observed data
- ,
- number of “no”:
- number of “yes”:
Likelihood
Prior
Evidence
Posterior
Remarks
- In the Bayesian framework, prior performs a theoretically sound smoothing on the likelihood.
- Conjugate prior: posterior and prior have the same form
Exploration
Summarize posterior by its mean
Prediction
The predictive distribution of a single unseen example is
In general, supposing there are number of “no” and number of “yes” in , the posterior is as follows.