Machine Learning and Friends Lunch |
||||
|
Learning Linguistic Structure using Nonparametric Bayesian TechniquesSharon Goldwater Brown Abstract
Adopting a Bayesian approach to language learning is useful for
investigating the nature of linguistic representations and learning
biases, and the kinds of information that are helpful for learning.
In this talk, I present a computational framework for modeling lexical
acquisition that uses nonparametric Bayesian statistical methods to
induce linguistic structure from unannotated data. This framework has
been applied previously for learning basic morphological structure
(stems and suffixes). Here, I discuss its application to the problem
of discovering word boundaries in phonemic transcriptions of
child-directed speech. I first develop a unigram model based on the
Dirichlet process, and compare its results to two previously proposed
models (Brent, 1999; Venkataraman, 2001). I then show how bigram
dependencies can be incorporated into the model using a hierarchical
Dirichlet process, leading to superior results.
|