1 / 29

Sampling Lower Bounds via Information Theory

Sampling Lower Bounds via Information Theory. Ziv Bar-Yossef IBM Almaden. Standard Approach to Hardness of Approximation. 8 a 2 A, b 2 B, f(a) is “far” from f(b). Given x 2 A [ B, decide if x 2 A. Hardness of approximation for f: X n ! Y. Hardness of a decision

lona
Download Presentation

Sampling Lower Bounds via Information Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sampling Lower Bounds via Information Theory Ziv Bar-Yossef IBM Almaden

  2. Standard Approach to Hardness of Approximation • 8a 2 A, b 2 B, f(a) is “far” from f(b). • Given x 2 A [ B, decide if x 2 A. Hardness of approximation for f: Xn! Y Hardness of a decision “promise problem” “Promise problem”: Xn B A

  3. The “Election Problem” • input: a sequence x ofn votes to k parties • Want to get  s.t. || - x|| < . • How big a poll should we conduct? Vote Distribution x (n = 18, k = 6) • 8 S µ [k], easy to decide between: A = { x | x(S) ¸ ½ +  } and B = { x | x(S) · ½ -  }. • Hardness due to the abundance of such decision problems ! poll has to be of size (k). 7/18 1/18 4/18 3/18 2/18 1/18

  4. Similarity Hardness vs. Abundance Hardness In this talk: • A lower bound technique that captures both types of hardness in the context of sampling algorithms. Similarity hardness Hardness of a decision “promise problem” Hardness of approximation for f: Xn! Y Abundance of decision “promise problems” Abundance hardness

  5. Why Sampling? Input Data Set A small number of queries • Queries can be chosen randomly • Output is typically approximate • Sub-linear time & space Algorithm

  6. Some Examples Statistics • Statistical decision and estimation • Statistical learning • … CS • PAC and machine learning • Property testing • Sub-linear time approximation algorithms • Extractors and dispersers • …

  7. Query Complexity Query complexity of a function f: # of queries required to approximate f Examples: • High query complexity: • Parity • # of distinct elements • Low query complexity: • Mean in [0,1] • Median

  8. Our Main Result • A technique for obtaining lower bounds on the query complexity of approximating functions • Template for obtaining specific lower bounds • Arbitrary domain and range • All types of approximation • Usable for wide classes of functions with symmetry properties • Outperforms previous techniques for functions with “abundance hardness” • Matches previous techniques for functions with “similarity hardness”

  9. Previous Work • Statistics • Crámer-Rao inequality • VC dimension • Optimality of the sequential probability ratio test • CS • Lower bounds via the Hellinger distance [B., Kumar, Sivakumar 01] • Specific lower bounds [Canetti, Even, Goldreich 95], [Radhakrishnan, Ta-Shma 96], [Dagum, Karp, Luby, Ross 95], [Schulman, Vazirani 99], [Charikar, Chaudhuri, Motwani, Narasayya 00] None addresses abundance hardness!

  10. Multi-Way Reduction from a Binary Promise Problem f: Xn! Y f(a) “disjoint inputs” pairwise Y f(b) f(c) Multi-way Binary promise problem: Given x 2 { a, b }, decide whether x = a or x = b or x = c , c Can be solved by any sampling algorithm approximating f

  11. Main Result The lower bound “recipe” f: Xn! Y: a function with an appropriate symmetry property • Identify a set S = { x1,…,xm } of “pairwise disjoint” inputs. • Calculate the “dissimilarity” D(x1,…,xm) among x1,…,xm. (D(¢,…,¢) is a distance measure taking values in [0,log m]). Theorem: Any algorithm approximating f requires q queries, where Tradeoff between “similarity hardness” and “abundance hardness”

  12. Measure of Dissimilarity i : distribution of the value of a uniformly chosen entry of xi Then: • Jensen-Shannon divergence 1 m 2 

  13. Application I: The Election Problem Previous bounds on the query complexity: • (1/2) [BKS01] • (k) [Batu et al. 00] • O(k/2) [BKS01] Theorem [This paper] (k/2)

  14. Combinatorial Designs t-design: B1 B3 B2 [k] Proposition For all k and for all t ¸ 12, there exists a t-design of size m = 2(k).

  15. Proof of the Lower Bound Step 1: Identification of a set S of pairwise disjoint inputs: B1,…,Bmµ [k]: a t-design of size m = 2(k). S = { x1,…,xm }, where Bi [k]nBi Step 2: Dissimilarity calculation: D(x1,…,xm) = O(2). By main theorem, # of queries is at least (k/2).

  16. Application II: Low Rank Matrix Approximation Exact low rank approximation: • Given an m £ n real matrix M and k · m,n, find the m £ n matrix Mkof rank k for which ||M – Mk||F is minimized. • Solution: SVD. Requires querying all of M. Approximate low rank approximation (LRMk): • Get a rank k martix A, s.t. ||M – A||F· ||M – Mk||F + ||M||F. Theorem[This paper] Computing LRMk requires (m + n) queries.

  17. Proof of the Lower Bound Step 1: Identification of a set S of pairwise disjoint inputs: B1,…,Btµ [2k]: a combinatorial design of size t = 2(k). S = { M1,…,Mt }, where 2k Mi is all-zero, except for the diagonal, which is the characteristic vector of Bi. 0 0 Bi 0 2k 0 0 • Mi is of rank k  (Mi)k = Mi. • ||Mi||F = k1/2. • ||Mi – Mj||F¸ (|Bin Bj|)1/2¸ (k/12)1/2 ¸ (||Mi||F + ||Mj||F). Step 2: Dissimilarity calculation: D(M1,…,Mt) = 2k/m. By main theorem, # of queries is at least (m).

  18. Low Rank Matrix Approximation (cont.) Theorem[Frieze, Kannan, Vempala 98] By querying an s £ s submatrix of M chosen using any distributions which “approximate” the row and column weight distributions of M, one can solve LRMk with s = O(k4/3). Theorem[This paper] Solving LRMk by querying an s £ s submatrix of M chosen even according to the exact row and column weight distributions of M requires s = (k/2).

  19. Oblivious Sampling • Query positions are independent of the given input. • Algorithm has a fixed query distribution  on [n]q. • i.i.d. queries: queries are independent and identically distributed:  = q, where  is a distribution on [n]. Phase 1: Choose query positions i1,…,iq Phase 2: Query xi1,…,xiq

  20. Main Theorem: Outline of the Proof Adaptive sampling (For functions with symmetry properties) Oblivious sampling with i.i.d queries Statistical classification Lower bounds via information theory

  21. Statistical Classification 1 q i.i.d. samples Classifier 2 i 2 [m] Black Box m • 1,…,m are distributions on Z. • Classifier is required to be correct with probability ¸ 1 - .

  22. From Sampling to Classification • T : oblivious algorithm with query distribution  = q that approximatesf: Xn! Y. • x : joint distribution of a query and its answer when T runs on input x (distribution on [n] £X). • S= {x1,…,xm} : set of pairwise disjoint inputs. x1 q i.i.d. samples T x2 Black Box Decide i iff T’s output 2A(xi) xm

  23. Jensen-Shannon Divergence [Lin 91] • KL divergence between distributions , on Z: • Jensen-Shannon divergence among distributions 1,…,m on Z: ( = (1/m) ii) 1 8 7 2 3  6 4 5

  24. Main Result Theorem [Classification lower bound] Any -error classifier for 1,…,m requires q queries, where Corollary [Query complexity lower bound] For any oblivious algorithm with query distribution  = q that (,)-approximates f, and for any set S = {x1,…,xm} of “pairwise disjoint” inputs, the number of queries q is at least

  25. Outline of the Proof Lemma 1[Classification error lower bound] Proof: by Fano’s inequality. Lemma 2[Decomposition of Jensen-Shannon] Proof: By subadditivity of entropy and conditional independence.

  26. Conclusions • General lower bound technique for the query complexity • Template for obtaining specific bounds • Works for wide classes of functions • Captures both “similarity hardness” and “abundance hardness” • Applications • The “Election Problem” • Low rank matrix approximation • Matrix reconstruction • Also proved • A lower bound technique for the expected query complexity • Tightly captures similarity hardness but not abundance hardness • Open problems • Tight bounds for low rank matrix approximation • Better lower bounds on the expected query complexity • Lower bounds for non-symmetric functions

  27. Simulation of Adaptive Sampling by Oblivious Sampling Definition f: Xn!Y is symmetric, if 8x and 82Sn, f((x)) = f(x). f is -symmetric, if 8x8, A((x)) = A(x). Lemma [BKS01] Any q-query algorithm approximating an -symmetricf can be simulated by a q-query oblivious algorithm whose queries are uniform without replacement. Corollary If q < n/2, can be simulated by a 2q-query oblivious algorithm whose queries are uniform with replacement.

  28. Simulation Lemma: Outline of the Proof • T: q-query sampling algorithm approximating f • WLOG, T never queries the same location twice. Simulation: • Pick a random permutation . • Run T on (x). • By -symmetry, output is likely to be in A((x)) = A(x). • Queries to x are uniform without replacement.

  29. Extensions Definitions • f is (g,)-symmetric if 8x,8,8y2 A((x)), g(,y)2 A(x). • A function f on m £ n matrices is -row-symmetric, if for all matrices M, and for all row-permutation matrices , A(¢M) = A(M). Similarly: -column-symmetry, and (g,)-row- and column-symmetry. We prove: similar simulations hold for all of the above.

More Related