Efficient Density Estimation via Piecewise Polynomial Approximation
S. Chan and I. Diakonikolas and R. Servedio and X. Sun.
In ACM Symposium on Theory of Computing (STOC), 2014, pp. 604--613.

Abstract:

We give a computationally efficient semi-agnostic algorithm for learning univariate probability distributions that are well approximated by piecewise polynomial density functions. Let $p$ be an arbitrary distribution over an interval $I$, and suppose that $p$ is $\tau$-close (in total variation distance) to an unknown probability distribution $q$ that is defined by an unknown partition of $I$ into $t$ intervals and $t$ unknown degree-$d$ polynomials specifying $q$ over each of the intervals. We give an algorithm that draws $\tilde{O}(t(d+1)/\eps^2)$ samples from $p$, runs in time $\poly(t,d+1,1/\eps)$, and with high probability outputs a piecewise polynomial hypothesis distribution $h$ that is $(4 \tau +\eps)$-close to $p$ in total variation distance. Our algorithm combines tools from real approximation theory, uniform convergence, linear programming, and dynamic programming. Its sample complexity is simultaneously near-optimal in all three parameters $t, d$ and $\eps$; we show that even for $\tau=0$, any algorithm that learns an unknown $t$-piecewise degree-$d$ probability distribution over $I$ to accuracy $\eps$ must use $\Omega({\frac {t(d+1)} {\poly(\log(d+2))}} \cdot {\frac 1 {\eps^2}})$ samples from the distribution, regardless of its running time.

We apply this general algorithm to obtain a wide range of results for many natural density estimation problems over both continuous and discrete domains. These include state-of-the-art results for learning mixtures of log-concave distributions; mixtures of $t$-modal distributions; mixtures of Monotone Hazard Rate distributions; mixtures of Poisson Binomial Distributions; mixtures of Gaussians; and mixtures of $k$-monotone densities. Our general technique gives improved results, with provably optimal sample complexities (up to logarithmic factors) in all parameters in most cases, for all these problems via a single unified algorithm.

pdf of conference version