Here are some further notes on language models that we didn’t get to during the 9/9 lecture.
I used the terms “pseudocount smoothing” and “Dirichlet smoothing”. You might also call it “add-α” smoothing. The Laplace smoothing method in the textbook is where you add pseudocount of 1 to each wordtype. It’s a special case of pseudocount/Dirichlet smoothing with α = 1.
As for actual software to implement n-gram language models, two important things to know are:
SRILM is the standard software used to train language models. For example, it can fit the interpolation parameters we were talking about. J&M talk about SRILM a bit.
KenLM might now be the most popular open-source LM software .. its uses only partially overlap with SRILM. It has many engineering optimizations.