Advances In Scalable Probabilistic Modeling Theory Applications And Challenges
Probabilistic modeling is a popular approach to solving machine learning problems. We will begin by reviewing variational inference, where Bayesian inference is mapped to non-convex optimization. We first introduce variational tempering, where we augment our probabilistic model with a temperature random variable. This leads to an adaptive annealing mechanism which prevents the algorithm from getting stuck in poor local optima. In addition, defining temperature locally for every datapoint results in a more robust model that automatically downweighs outliers, leading to better density estimates. In the second part of the talk, we give a Bayesian analysis of stochastic gradient descent with constant learning rates (constant SGD). In particular, we relate this algorithm to Markov-Chain Monte Carlo (MCMC) sampling. Drawing on the tools of variational inference and stochastic differential equations, we investigate and formalize this connection and show how we can use constant SGD as a cheap and scalable approximate MCMC sampler that can compete with more complicated state-of-the-art variational approaches. Finally, we introduce exponential family embeddings which give a more statistical view on word embeddings and allow us to generalize them to other kinds of high-dimensional data.
Stephan Mandt is a Research Scientist at Disney Research Pittsburgh, where he leads the statistical machine learning group. Previously, he was a postdoctoral researcher with David Blei at Columbia University, where he worked on scalable approximate Bayesian inference algorithms. Trained as a statistical physicist, he held a previous postdoctoral fellowship at Princeton University and holds a Ph.D. from the University of Cologne as a fellow of the German National Academic Foundation.