Learning Sums of Independent Integer Random Variables

Let S = X 1 +···+X n be a sum of n independent integer random variables X i , where each X i is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi's need not be identically distributed). How many samples are required to learn the distribution...

Full description

Saved in:

Bibliographic Details
Published in	2013 IEEE 54th Annual Symposium on Foundations of Computer Science pp. 217 - 226
Main Authors	Daskalakis, Constantinos, Diakonikolas, Ilias, ODonnell, Ryan, Servedio, Rocco A., Li-Yang Tan
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2013
Subjects	Accuracy Approximation methods Complexity theory Digital TV discrete distribution learning Gaussian distribution limit theorem Random variables sums of independent integer random variables
Online Access	Get full text
ISSN	0272-5428
DOI	10.1109/FOCS.2013.31

Cover

More Information
Summary:	Let S = X 1 +···+X n be a sum of n independent integer random variables X i , where each X i is supported on {0, 1, ..., k - 1} but otherwise may have an arbitrary distribution (in particular the Xi's need not be identically distributed). How many samples are required to learn the distribution S to high accuracy? In this paper we show that the answer is completely independent of n, and moreover we give a computationally efficient algorithm which achieves this low sample complexity. More precisely, our algorithm learns any such S to ε-accuracy (with respect to the total variation distance between distributions) using poly(k, 1/ε) samples, independent of n. Its running time is poly(k, 1/ε) in the standard word RAM model. Thus we give a broad generalization of the main result of [DDS12b] which gave a similar learning result for the special case k = 2 (when the distribution S is a Poisson Binomial Distribution). Prior to this work, no nontrivial results were known for learning these distributions even in the case k = 3. A key difficulty is that, in contrast to the case of k = 2, sums of independent {0, 1, 2}-valued random variables may behave very differently from (discretized) normal distributions, and in fact may be rather complicated - they are not log-concave, they can be Θ(n)-modal, there is no relationship between Kolmogorov distance and total variation distance for the class, etc. Nevertheless, the heart of our learning result is a new limit theorem which characterizes what the sum of an arbitrary number of arbitrary independent {0, 1, ... , k-1}-valued random variables may look like. Previous limit theorems in this setting made strong assumptions on the "shift invariance" of the random variables Xi in order to force a discretized normal limit. We believe that our new limit theorem, as the first result for truly arbitrary sums of independent {0, 1, ... , k-1}-valued random variables, is of independent interest.
ISSN:	0272-5428
DOI:	10.1109/FOCS.2013.31