Loading in 5 sec....

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding WindowPowerPoint Presentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Download Presentation

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Loading in 2 Seconds...

- 109 Views
- Uploaded on
- Presentation posted in: General

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A Deterministic Algorithm forSummarizing Asynchronous Streamsover a Sliding Window

Costas Busch

Rensselaer Polytechnic Institute

Srikanta Tirthapura

Iowa State University

Outline of Talk

Introduction

Algorithm

Analysis

Time

Data stream:

For simplicity assume

unit valued elements

Most recent time window of duration

Current

time

Data stream:

Goal:

Compute the sum of elements with

time stamps in time window

- Example I: All packets on a network link, maintain the number of different ip sources in the last one hour
- Example II: Large database, continuously maintain averages and frequency moments

- Synchronous stream
- ti: In ascending order

- Asynchronous stream
- ti: No order guaranteed

Data stream:

Why Asynchronous Data Streams?

Synchronous stream

Asynchronous stream

Network delay & multi-path routing

Synchronous

Asynchronous

Synchronous

Merge w/o control

- Processing Requirements:
- One pass processing
- Small workspace: poly-logarithmic in the size of data
- Fast processing time per element
- Approximate answers are ok

Our results:

A deterministic data aggregation algorithm

Time:

Space:

Relative Error:

Previous Work:

[Datar, Gionis, Indyk, Motwani.

SIAM Journal on Computing, 2002]

Deterministic, Synchronous

Merging buckets

[Tirthapura, Xu, Busch, PODC, 2006]

Randomized, Asynchronous

Random sampling

Outline of Talk

Introduction

Algorithm

Analysis

Time

Current

time

Data stream:

For simplicity assume

unit valued elements

Most recent time window of duration

Current

time

Data stream:

Goal:

Compute the sum of elements with

time stamps in time window

Divide time into periods of duration

sliding window

The sliding window may span at most

two time periods

sliding window

Sum can be written as two sub-sums

In two time periods

sliding window

Data structure that

maintains an estimate of

In left time period

Without loss of Generality,

Consider data structure

in time period

Data structure consists of various levels

is an upper bound of the sum in a period

Consider level

Bucket at Level

Time period

Counts up to elements

Stream:

Increase counter value

Stream:

Increase counter value

Stream:

Increase counter value

Stream:

Increase counter value

Stream:

Split bucket

Counter threshold of reached

Stream:

New buckets have threshold also

Stream:

Increase appropriate bucket

Stream:

Increase appropriate bucket

Stream:

Increase appropriate bucket

Stream:

Split bucket

Stream:

Stream:

Increase appropriate bucket

Stream:

Split bucket

Stream:

Splitting Tree

Max depth =

Leaf buckets of duration 1

are not split any further

Leaf buckets

The initial bucket may be split into

many buckets

Leaf buckets

Due to space limitations

we only keep the last

buckets

Suppose we want to find the sum

of elements in time period

Consider various levels

of splitting threshold

First level with a leaf bucket

that intersects timeline

Estimate of S:

Consider buckets on right of timeline

OR

First level with a leaf bucket

On right timeline

Outline of Talk

Introduction

Algorithm

Analysis

Suppose that we use level

in order to compute the estimate

Stream:

Consider splitting threshold level

A data element is counted in the

appropriate bucket

Stream:

We can assume that the element is

placed in the respective bucket

Stream:

We can assume that when bucket

splits the element is placed in an

arbitrary child bucket

Stream:

If:

GOOD!

Element counted in correct bucket

Stream:

If:

BAD!

Element counted in wrong bucket

Consider Leaf Buckets

GOOD!

If

Consider Leaf Buckets

BAD!

If

Element counted in wrong bucket

Consider Leaf Buckets

:elements of left part counted on right

:elements of right part counted on left

elements of left part

counted on right

Must have been initially inserted

in one of these buckets

Since tree depth

Since tree depth

Similarly, we can prove

Therefore:

Since

It can be proven

Since

It can be proven

Combined with

We obtain relative error :