Exponential Stochastic Cellular Automata For Massively Parallel Inference
We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order of magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 4-node cluster sampling more than half a billion tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.
This work was done at Oracle labs with Michael L Wick, Jean-Baptiste Tristan and Guy L Steele.
Manzil Zaheer is a PhD student at Carnegie Mellon University, advised by Alexander Smola. Currently, he is a research intern at Oracle labs. He is broadly interested in machine learning. He works in the design and implementation of scalable machine learning algorithms for distributed and parallel architectures. He recently has been interested on recurrent neural networks, representation learning, and automatic theorem proving. One of his research aims is to solve the mismatch between statistical models and computational resources and develop scalable approaches able to handle enormous amounts in order to solve practical problems.