Sequence Prediction With Neural Segmental Models
Segments that span contiguous parts of inputs, such as phonemes in speech, named-entities in sentences, actions in videos, occur frequently in sequence prediction problems. Recent work has shown that segmental models, a class of models that explicitly hypothesizes segments, can significantly improve accuracy. However, segmental models suffer from slow decoding, hampering the use of computationally expensive features. In addition, training segmental models requires detailed manual annotation, which makes collecting datasets expensive.
In the first part of the talk, I will introduce discriminative segmental cascades, a multi-pass framework that allows us to improve accuracy by adding higher-order features and neural segmental features while maintaining efficiency. I will also show how the cascades can be used to speed up inference and training. In the second part of the talk, I will discuss end-to-end training for segmental models with various loss functions. I will address the difficulty of end-to-end training from random initialization by comparing it to two-stage training. Finally, I will show how end-to-end training can eliminate the need for detailed manual annotation.
Hao Tang is a Ph.D. candidate at Toyota Technological Institute at Chicago. His main interests are in machine learning and its application to speech recognition, with particular interests in discriminative training and segmental models. His work on segmental models has been nominated for the Best Paper award at ASRU 2015, and an application of such models to fingerspelling recognition has earned a Best Student Paper Award at ICASSP 2016. He received a B.S. degree in Computer Science and a M.S. degree in Electrical Engineering from National Taiwan University in 2007 and 2010, respectively.