140 likes | 144 Views
Testing Collections of Properties. Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011. Shopping distribution. What properties do your distributions have?. Testing closeness of two distributions:. Transactions in California. Transactions in New York. trend change?. Testing Independence:.
E N D
Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011
Shopping distribution What properties do your distributions have?
Testing closeness of two distributions: Transactions in California Transactions in New York trend change?
Testing Independence: Shopping patterns: Independent of zip code?
One distribution: D • D is arbitrary black-box distribution over [n],generates iid samples. • Sample complexity in terms of n? (can it be sublinear?) samples Test Pass/Fail?
Some answers… • Uniformity(n1/2)[Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08] • Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] • Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08] • Independence O(n12/3 n21/3), (n12/3 n21/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work • Entropy n1/β^2+o(1)[Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08] • Support Size (n/logn)[Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10] • Monotonicity on total order (n1/2)[Batu, Kumar, Rubinfeld 04] • Monotonicity on posetn1-o(1)[Bhattacharyya, Fischer, Rubinfeld, Valiant 10]
Collection of distributions: Further refinement: Known or unknown distribution on i’s? D1 D2 Dm • Two models: • Sampling model: • Get (i,x) for random i, xDi • Query model: • Get (i,x)for query i and xDi • Sample complexity in terms of n,m? … samples Test Pass/Fail?
Properties considered: • Equivalence • All distributions are equal • ``Clusterability’’ • Distributions can be clustered into k clusters such that within a cluster, all distributions are close
Equivalence vs. independence • Process of drawing pairs: • Draw i [m], x Di output (i,x) • Easy fact: (i,x) independent iff Di‘s are equal
Also yields “tight” lower bound for independence testing Results Def:(D1,…Dm) has the Equivalence property if Di = Di' for all 1 ≤ i, i’ ≤ m.
Clusterability • Can we cluster distributions s.t. in each cluster, distributions (very) close? • Sample complexity of test is • O(kn2/3) for n = domain size, k = number of clusters • No dependence on number of distributions • Closeness requirement is very stringent
Open Questions • Clusterability in the sampling model, less stringent notion of close • Other properties of collections? • E.g., all distributions are shifts of each other?