Beta–binomial unigram language model
Recap
Task
- specify model structure
- specify probability distribution
- specify independence assumption
- specify other modeling assumption
Goal
- explanation: form posterior of latent variable to explain data
- prediction: form probability of unseen data
Notation
: all unknown random variables of the model
: observed data
: modeling assumption
: new data
Text model
Corpus of document consists of
word tokens
which are characterized by
, the probability of type “no”
(Bernoulli distributed).
NOTE: no document boundaries are considered
,
- Independence assumption:
for all
- Exchangeable: bag-of-word model.
eg:
where
.
Specify prior :
specify the degree of belief of the value of
on the entire real line.
Hyperparameters of Beta distribution
: concentration parameter – how concentrated the samples are to the mean
: mean of the beta distribution, ie,
Properties of Gamma function
Mean of Beta distribution
Beta–binomial model
Generating process
for
from 1 to N
Observed data
,
- number of “no”:
- number of “yes”:
Likelihood
Prior
Evidence
Posterior
Remarks
- In the Bayesian framework, prior performs a theoretically sound smoothing on the likelihood.
- Conjugate prior: posterior and prior have the same form
Exploration
Summarize posterior by its mean
Prediction
The predictive distribution of a single unseen example is
In general, supposing there are number of “no” and
number of “yes” in
, the posterior is as follows.