Testing Big Data Distributions
Given samples from an unknown distribution p, is it possible to distinguish whether p has a certain property P, versus far from P? This fundamental question has received tremendous attention in statistics, and information theory under hypothesis testing focusing on the asymptotic analysis, and more recently initiated in the computer science literature, where the emphasis has been on small sample size and computational complexity. Nevertheless, even for basic properties of distributions such as monotonicity, unimodality, and independence the optimal sample complexity of this problem is not known.
We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. Underlying our approach is an algorithm that distinguishes whether two distributions are close in the chi-squared distance or far in the L1 distance.
The talk is based on joint work with Constantinos Daskalakis and Gautam Kamath.
Jayadev Acharya is a postdoc in EECS department at MIT. He obtained a Ph.D in ECE from UC San Diego. His research interests lie in information theory, machine learning, algorithms, and statistics.