1 / 25

Continuous Distributed Counting for Non-monotonic Streams

Continuous Distributed Counting for Non-monotonic Streams. Milan Vojnović Microsoft Research Joint work with Zhenming Liu and Božidar Radunović. January 14, 2012. SUM Tracking Problem. : . Maintain estimate . k. 1. 2. 3. SUM:. SUM Tracking: Applications.

wood
Download Presentation

Continuous Distributed Counting for Non-monotonic Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Distributed Counting for Non-monotonic Streams Milan Vojnović Microsoft Research Joint work with Zhenming Liu and BožidarRadunović January 14, 2012

  2. SUM Tracking Problem : Maintain estimate k 1 2 3 SUM:

  3. SUM Tracking: Applications • Ex 1: database queriesSELECT SUM(AdBids)from Ads • Ex 2: iterative solving input data

  4. Related Work • Count tracking [Huang, Yi and Zhang, 2011] • Worst-case input, monotonic sum • Expected communication cost, for : and lower bound • Lower bound for worst case input[Arackaparambil, Brody and Chakrabarti, 2009] • Expected communication cost:

  5. Questions • Worst case complexity Ex. +1, -1, +1, … • Complexity under random input? • Random permutation • Random i.i.d. • Fractional Brownian motion

  6. Outline • Upper bounds • Lower bounds • Applications

  7. Our Tracker Algorithm • Sampling based algorithm: upon arrival of update , send a message to the coordinator w. p. • If any site sends a message: sync all S S1 S, S1 site S = S1+ … + Sk coordinator S, Sk Mi = 1 Xi Sk S site

  8. Algorithm’s Modes Sample Use two monotonic counters Always report Sample

  9. Communication Cost Upper BoundSingle Site • Input: i.i.d. Bernoulli • Sampling probability with and Expected communication cost:

  10. Proof Key Idea message sent

  11. Communication Cost Upper BoundMultiple Sites • sites • Updates i.i.d. Bernoulli • large enough and Expected communication cost:

  12. Communication Cost Upper BoundUnknown Drift Case • Input: i.i.d. Bernoulli • : unknown drift parameter Expected communication cost:

  13. Communication Cost Upper BoundRandom Permutation Input • Input: a random permutation of values • sufficiently large and Expected communication cost:

  14. Communication Cost Upper BoundFractional Brownian Motion • Input: a fractional Brownian motion with Hurst parameter Expected communication cost:

  15. Outline • Upper bounds • Lower bounds • Applications

  16. Lower BoundsSingle Site, Zero Drift • Input: i.i.d. Bernoulli Expected communication cost:

  17. Lower BoundsMultiple Sites • Input: i.i.d. Bernoulli or a random permutation Expected communication cost:

  18. Proof Key Ideas • Under , maximum deviation 1 2 k

  19. Proof Key Ideas (cont’d) k-input problem: • Query: or ? • Answer: incorrect only if and the answer is under or under • Lemma: messages is necessary to answer the query correctly with a constant positive probability 1 2 k i. i. d.

  20. Outline • Upper bounds • Lower bounds • Applications

  21. App 1: Tracking (cont’d) • Input: random permutation of • , • Problem: track within a prescribed relative tolerance with high probability

  22. AMS Sketch • , 4-wise independent hash function

  23. App 1: tracking (cont’d) • AMS: within w. p. using and • Sum tracking: Expected total communication:

  24. App 2: Bayesian Linear Regression • Feature vector , output • Prior osterior • Sum tracking: • Under random permutation input, the expected communication cost =

  25. Summary • We considered the sum tracking problem with non-monotonic distributed streams under random permutation, random i. i. d. and fractional Brownian motion • Derived a practical algorithm that has order optimal communication complexity

More Related