Grid Search Is A Bad Hyper-parameter Optimization Algorithm
Abstract: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. Manual search is well known to produce results that can be difficult to reproduce. In this talk, I will argue that grid and manual search are inefficient and ineffective compared with alternatives based on Bayesian Optimization, and even random search. I will argue empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that, for most data sets, only a few of the hyper-parameters really matter, but furthermore, that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. This analysis casts some light on why recent ``High Throughput'' methods achieve surprising success---they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much.
In settings where brute force methods are not sufficiently efficient, we must naturally turn to algorithms which use the results of earlier trials to inform the course of the experiment. Manual search is adaptive in this sense, but not principled (and not generally an algorithm at all!) In contrast, Bayesian optimization algorithms are both adaptive and principled. I will discuss recent and ongoing work on practical Bayesian optimization algorithms for hyper-parameter optimization that outperform the manual optimization of Deep Belief Networks and the high-throughput random search for HT-L3 models.
Bio: James Bergstra is a post-doctoral researcher in David Cox's biological and computer vision group at the Rowland Institute at Harvard. He completed doctoral studies at the University of Montreal in July 2011 under the direction of Professor Yoshua Bengio with a dissertation on how to incorporate neurophysiological findings (``complex cells'') into neural networks for pattern recognition and deep learning models. In the course of his graduate work he co-developed Theano, an open-source optimizing compiler for tensor-valued expressions that can make use of Graphics Processing Units (GPUs) for high-performance computation. He completed a Masters in 2006 under the direction of Douglas Eck on algorithms for classifying recorded music by genre.