1 / 14

Efficient Algorithms via Precision Sampling

Efficient Algorithms via Precision Sampling. Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU). Goal. Compute the fraction of Dacians in the empire. Estimate S=a 1 +a 2 +…a n where a i  [0,1]. Sampling.

fausto
Download Presentation

Efficient Algorithms via Precision Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)

  2. Goal Compute the fraction of Dacians in the empire Estimate S=a1+a2+…an where ai[0,1]

  3. Sampling • Send accountants to a subset J of provinces, |J|=m • Estimator: S̃=∑jJaj* n / m • Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S̃ < 2*S + O(n/m) • For constant additive error, need m~n

  4. Precision Sampling Framework • Send accountants to each province, but require only approximate counts • Estimate aiup to pre-selected precisionuii.e. |ai–ãi|<ui • Challenge: achieve good tradeoff between • quality of approximation to S • total cost of computing each ãi(within precision ui)

  5. Formalization • What is our cost model? • Here, average cost = 1/n *∑i 1/ui • Achieving precision ui requires 1/ui “resources”: e.g., if ai is itself a sum ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples • For example, can choose all ui=1/n • Average cost ≈ n • This is best possible, if estimator S̃ = ∑iãi Estimator (Alg) Adversary 1. fix precisions ui 1. fix (hidden) a1,a2,…an • 2. fixã1,ã2,…ãns.t.|ai–ãi|<ui 3. report S̃s.t.|∑iai–S̃| < 1

  6. Precision Sampling Lemma • Goal: estimate ∑aifrom {ãi} satisfying |ai-ãi|<ui. • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost O(log n) • Example: distinguish Σai=3 vsΣai=1 • Consider two extreme cases: • if three ai=1: estimate all ai with crude approx(ui=0.1) • if all ai=3/n: estimate few with good approxui=1/n, the rest with ui=1 ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n)

  7. Precision Sampling Algorithm • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Algorithm: • Choose each ui[0,1] i.i.d. • Estimator: S̃ = count number of i‘s s.t.ãi/ui > 6 (and normalize) • Outline of analysis: • E[S̃] = ∑iPr[ãi/ui > 6] = ∑iPr[ai > (6±1)ui] ≈ ∑ ai/6. • Actually, ãimay have also 1.5-multiplicative error w.r.t. ai • E[1/ui] = O(log n)w.h.p. (after truncation) ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n) concrete distrib. = minimum of O(ε-3)uniform r.v. function of [ãi/ui- 4/ε]+andui’s

  8. Why? • Save time: • Problem: computing edit distance between two strings [FOCS’10] • new algorithm that obtains (log n)1/ε approximation in n1+O(ε) time • via property-testing-like algorithm using Precision Sampling (recursively) • Save space: • Problem: compute norms/moments of frequencies in a data-stream [FOCS’11] • a simple and unified approach to compute all lp-norms/moments, and related problems

  9. Streaming/sketching 131.107.65.14 Challenge: log statistics of the data, using smallspace 18.0.1.12 131.107.65.14 80.97.56.20 18.0.1.12 80.97.56.20 131.107.65.14

  10. Streaming moments • Setup: • 1+ε estimate frequencies in small space • Let xi= frequency of IP i • pth moment: Σixip • p=1: keep one counter! • p[0,2]: space O(ε-2 ¢log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] • p>2: space Õε(n1-2/p) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] • Generally, xRn (updates: to coordinate i with ±1) • Sketch = embedding into a “space” of small dimension • Usually, linear L:RnRm for m¿n, thus L(x±ei)=Lx±Lei

  11. lp moments • Theorem: linear sketch for lp with O(1) approximation, and O(n1-2/p log n) space (90% succ. prob.). • =weak embedding of lpninto l∞mof dim m=O(n1-2/p log n) • Sketch: • pick random ui[0,1], ri{±1} and let yi = ri∙xi/ui1/p • throw yi‘s into hash table H with m=O(n1-2/p log n) cells • Estimator: • via PSL or just Maxj[m] |H[j]|p • Randomness: O(1) independence suffices x= H= 1 … m

  12. Under the Hood: Using PSL • Idea: Use PSL to compute the sum ||x||pp=∑i |xi|p • Assume ||x||2=1 by scaling • Set PSL additive error ε small compared to ||x||2p/np/2-1·||x||pp • Outline: • 1. Pick ui’s according to PSL and let yi=xi/ui1/p • 2. Compute every yip=xip/ui within additive approximation 1 • done via heavy hitters of the vector y • 3. Use PSL on |yipui|=|xi|p to compute the sum ∑i |xi|p • Space bound is controlled by the norm ||y||22. • Since heavy hitters under l2 is the best we can do • Notice E||y||22 = ||x||22¢ E[1/u2/p] · (1/ε)2/p=(np/2-1)2/p.

  13. More Streaming Algorithms • Other streaming algorithms: • Same algorithm for all p-moments, including p≤2 • For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] • For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] • Algorithms for mixed norms (lp of lq)[CM05, GBD08, JW09] • Space bounded by (Rademacher) p-type constant • Algorithm for lp-sampling problem [MW’10] • This work extended to give tight bounds by [JST’10] • Connections: • Inspired by the streaming algorithm of [IW05], but simpler • Turns out to be distant relative of Priority Sampling [DLT’07]

  14. Finale • Other applications for Precision Sampling framework? • Better algorithms for precision sampling? • For average cost (for 1+ε approximation) • Upper bound: O(ε-3 log n) (tight for our algorithm) • Lower bound:Ω(ε-2 log n) • Bounds for other cost models? • E.g., for 1/square root of precision, the bound is O(ε-3/2) • Other forms of “access” to ai’s? Thank you!

More Related