Hi all,
Our next meeting in the Information Theory and Applications seminar will
take place on Monday, March 4 at 10:00, in room A500.
The speaker is Alon Kipnis, who will tell us about uniformity testing from
a minimiax risk perspective.
See you there,
Or, Oron, Yuval and Alex
---------------------------------------------
*Title*: The minimax risk in testing uniformity under missing ball
alternatives
*Abstract*: We study the problem of "uniformity testing," i.e., testing the
goodness of fit of categorical data to the uniform distribution over the
categories. As a class of alternative hypotheses, we consider the removal
of an $\ell_p$ ball of radius $\epsilon$ around the uniform rate sequence
for $p \leq 2$. When the number of samples $n$ and number of categories $N$
go to infinity while $\epsilon$ is small, we show that the minimax risk
$R_\epsilon^*$ in testing based on the sample's histogram (number of absent
categories, singletons, collisions, ...) asymptotes to
$2\Phi(-n N^{2-2/p} \epsilon^2/\sqrt{8N})$; $\Phi(x)$ is the normal CDF. As
it turns out, the minimax test relies on collisions in the very small
sample limit but otherwise behaves like the chisquared test. Empirical
studies over a range of problem parameters show that our estimate is
accurate in finite samples and that the minimax test is significantly
better than the chisquared test or a test that only uses collisions.
Our result settles several known challenges in uniformity and "identity"
testing from the last two decades. In particular, it allows the comparison
of the many estimators previously proposed for this problem at the constant
level rather than at the rate of convergence of the risk or the scaling
order of the sample complexity.
Our analysis introduces several new methods by adapting techniques
previously used by Ingster and Suslina for Gaussian signal detection.