Detecting low-degree truncation.
A. De and H. Li and S. Nadimpalli and R. Servedio.
In 56th Annual Symposium on Theory of Computing (STOC), 2024, to appear.


Abstract:

We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\R^n$ and a collection of data points in $\R^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \R^n$.

We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d.~product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\R^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound:

  • For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$);
  • Even for the simplest case of ${\cal D}$ being the uniform distribution over $\bits^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $\Omega(n^{d/2})$ samples.
  • Link to full version


    Back to main papers page