A deterministic algorithm for summarizing asynchronous streams over a sliding window
Sponsored Links
This presentation is the property of its rightful owner.
1 / 58

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window. Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University. Outline of Talk. Introduction. Algorithm. Analysis. Time. Data stream:. For simplicity assume

Download Presentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A Deterministic Algorithm forSummarizing Asynchronous Streamsover a Sliding Window

Costas Busch

Rensselaer Polytechnic Institute

Srikanta Tirthapura

Iowa State University


Outline of Talk

Introduction

Algorithm

Analysis


Time

Data stream:

For simplicity assume

unit valued elements


Most recent time window of duration

Current

time

Data stream:

Goal:

Compute the sum of elements with

time stamps in time window


  • Example I: All packets on a network link, maintain the number of different ip sources in the last one hour

  • Example II: Large database, continuously maintain averages and frequency moments


  • Synchronous stream

    • ti: In ascending order

  • Asynchronous stream

    • ti: No order guaranteed

Data stream:


Why Asynchronous Data Streams?

Synchronous stream

Asynchronous stream

Network delay & multi-path routing

Synchronous

Asynchronous

Synchronous

Merge w/o control


  • Processing Requirements:

    • One pass processing

    • Small workspace: poly-logarithmic in the size of data

    • Fast processing time per element

    • Approximate answers are ok


Our results:

A deterministic data aggregation algorithm

Time:

Space:

Relative Error:


Previous Work:

[Datar, Gionis, Indyk, Motwani.

SIAM Journal on Computing, 2002]

Deterministic, Synchronous

Merging buckets

[Tirthapura, Xu, Busch, PODC, 2006]

Randomized, Asynchronous

Random sampling


Outline of Talk

Introduction

Algorithm

Analysis


Time

Current

time

Data stream:

For simplicity assume

unit valued elements


Most recent time window of duration

Current

time

Data stream:

Goal:

Compute the sum of elements with

time stamps in time window


Divide time into periods of duration


sliding window

The sliding window may span at most

two time periods


sliding window

Sum can be written as two sub-sums

In two time periods


sliding window

Data structure that

maintains an estimate of

In left time period


Without loss of Generality,

Consider data structure

in time period


Data structure consists of various levels

is an upper bound of the sum in a period


Consider level

Bucket at Level

Time period

Counts up to elements


Stream:

Increase counter value


Stream:

Increase counter value


Stream:

Increase counter value


Stream:

Increase counter value


Stream:

Split bucket

Counter threshold of reached


Stream:

New buckets have threshold also


Stream:

Increase appropriate bucket


Stream:

Increase appropriate bucket


Stream:

Increase appropriate bucket


Stream:

Split bucket


Stream:


Stream:

Increase appropriate bucket


Stream:

Split bucket


Stream:


Splitting Tree


Max depth =

Leaf buckets of duration 1

are not split any further


Leaf buckets

The initial bucket may be split into

many buckets


Leaf buckets

Due to space limitations

we only keep the last

buckets


Suppose we want to find the sum

of elements in time period


Consider various levels

of splitting threshold


First level with a leaf bucket

that intersects timeline


Estimate of S:

Consider buckets on right of timeline


OR

First level with a leaf bucket

On right timeline


Outline of Talk

Introduction

Algorithm

Analysis


Suppose that we use level

in order to compute the estimate


Stream:

Consider splitting threshold level

A data element is counted in the

appropriate bucket


Stream:

We can assume that the element is

placed in the respective bucket


Stream:

We can assume that when bucket

splits the element is placed in an

arbitrary child bucket


Stream:

If:

GOOD!

Element counted in correct bucket


Stream:

If:

BAD!

Element counted in wrong bucket


Consider Leaf Buckets

GOOD!

If


Consider Leaf Buckets

BAD!

If

Element counted in wrong bucket


Consider Leaf Buckets

:elements of left part counted on right

:elements of right part counted on left


elements of left part

counted on right

Must have been initially inserted

in one of these buckets


Since tree depth


Since tree depth

Similarly, we can prove

Therefore:


Since

It can be proven


Since

It can be proven

Combined with

We obtain relative error :


  • Login