slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Monitoring Streams -- A New Class of Data Management Applications PowerPoint Presentation
Download Presentation
Monitoring Streams -- A New Class of Data Management Applications

Loading in 2 Seconds...

play fullscreen
1 / 32

Monitoring Streams -- A New Class of Data Management Applications - PowerPoint PPT Presentation


  • 186 Views
  • Uploaded on

Monitoring Streams -- A New Class of Data Management Applications. Don Carney Brown University Uğur Çetintemel Brown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon Lee Brown University Greg Seidman Brown University Michael Stonebraker MIT

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Monitoring Streams -- A New Class of Data Management Applications' - byrd


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Monitoring Streams -- A New Class of Data Management Applications

Don Carney Brown University

Uğur ÇetintemelBrown University

Mitch Cherniack Brandeis University

Christian Convey Brown University

Sangdon Lee Brown University

Greg Seidman Brown University

Michael Stonebraker MIT

Nesime Tatbul Brown University

Stan Zdonik Brown University

background
Background
  • MIT/Brown/Brandeis team
  • First Aurora, then Borealis
    • Practical system
    • Designed for Scalablility: 106 stream inputs, queries
    • QoS-Driven Resource Management
    • Stream Storage Management
    • Realiability/ Fault Tolerance
    • Distribution and Adaptivity
  • First stream startup: StreamBase
    • Financial applications
example stream applications
Example Stream Applications
  • Market Analysis
    • Streams of Stock Exchange Data
  • Critical Care
    • Streams of Vital Sign Measurements
  • Physical Plant Monitoring
    • Streams of Environmental Readings
  • Biological Population Tracking
    • Streams of Positions from Individuals of a Species
not your average dbms
Not Your Average DBMS
  • External, Autonomous Data Sources
  • Querying Time-Series
  • Triggers-in-the-large
  • Real-time response requirements
  • Noisy Data, Approximate Query Results
outline
Outline

2. Aurora Overview/ Query Model

  • Runtime Operation
  • Adaptivity

®

aurora from 100 000 feet

App

App

App

QoS

QoS

QoS

  • Each Provides:
    • Aover input data streams
    • A Quality-Of-Service Specification ( )
  • (specifies utility of partial or late results)

Application

Query

QoS

Aurora from 100,000 Feet

Query

.

.

.

.

.

.

.

.

.

Query

.

.

.

.

.

.

.

.

.

.

.

.

Query

slide7

App

App

App

QoS

QoS

QoS

Aurora from 100 Feet

Slide

s

s

.

.

.

.

.

.

s

s

m

.

.

.

.

.

.

.

.

.

È

m

Tumble

s

m

  • Query Operators (Boxes)
    • Simple: FILTER, MAP, RESTREAM
    • Binary: UNION, JOIN, RESAMPLE
    • Windowed: TUMBLE, SLIDE, XSECTION, WSORT
  • Queries = Workflow (Boxes and Arcs)
    • Workflow Diagram = “Aurora Network”
    • Boxes = Query Operators
    • Arcs = Streams
  • Streams (Arcs)
    • stream: tuple sequence from common source
      • (e.g., sensor)
    • tuples timestamped on arrival (Internal use: QoS)
slide8

App

App

App

QoS

QoS

QoS

Aurora in Action

Slide

s

s

s

s

s

s

.

.

.

.

.

.

s

s

s

s

s

s

s

App

m

s

s

s

s

m

m

s

.

.

.

.

.

.

È

È

È

.

.

.

È

È

È

È

m

m

m

App

Tumble

Tumble

Tumble

s

s

s

s

m

m

s

m

Arcs ® Tuple Queues

“Box-at-a-time” Scheduling

Outputs Monitored for QoS

continuous and historical queries

Queues

O1

O2

O3

App

continuous query

QoS

QoS

QoS

O4

O5

App

ad-hoc query

O8

O9

O7

3 Days

view

Continuous and Historical Queries

1 Hour

Connection

Point

quality of service qos
Quality-of-Service (QoS)

B

C

A

Specifies “Utility” Of Imperfect Query Results

Delay-Based (specify utility of late results)

Delivery-Based, Value-Based (specify utility of partial results)

QoS Influences…

Scheduling, Storage Management, Load Shedding

%TuplesDelivered

Output Value

Delay

talk outline
Talk Outline
  • Introduction

2. Aurora Overview

3. Runtime Operation

4. Adaptivity

5. Related Work and Conclusions

®

runtime operation basic architecture

inputs

outputs

Storage Manager

q1

q2

.

.

.

s

s

qi

m

Buffer

.

.

.

.

.

.

È

È

Persistent Store

Catalog

q1

q2

.

.

.

qn

Runtime OperationBasic Architecture

Router

Scheduler

Box Processors

QOS

Monitor

runtime operation scheduling maximize overall qos
Runtime OperationScheduling: Maximize Overall QoS

Delay = 2 sec

Utility = 0.5

Choice 1:

A: Cost: 1 sec

(…, age: 1 sec)

Delay = 5 sec

Utility = 0.8

B: Cost: 2 sec

Choice 2:

(…, age: 3 sec)

Schedule Box A now rather than later

Ideal: Maximize Overall Utility

Presently exploring scalable heuristics (e.g., feedback-based)

runtime operation scheduling minimizing per tuple processing overhead

z

z

z

y

y

y

x

x

x

AB

B (A (z))

B (A (y))

B (A (x))

Box Trains:

B

B (A (z), A (y), A (x))

A

A (z, y, x)

Tuple Trains:

Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead

Train Scheduling:

B

A

A (z)

A (y)

A (x)

B (A (z))

B (A (y))

B (A (x))

Default Operation: = Context Switch

runtime operation storage management
Runtime OperationStorage Management
  • Run-time Queue Management

Prefetch Queues Prior to Being Scheduled

Drop Tuples from Queues to Improve QoS

2. Connection Point Management

Support Efficient (Pull-Based) Access to Historical Data

E.g., indexing, sorting, clustering, …

talk outline16
Talk Outline
  • Introduction

2. Aurora Overview

3. Runtime Operation

4. Adaptivity

5. Related Work and Conclusions

®

stream query optimization
Stream Query Optimization
  • Differences with Traditional Query Optimization?
stream query optimization18
Stream Query Optimization
  • New classes of operators (windows) may mean new rewrites
  • New execution modes (continuous/pipelining)
  • More dynamic fluctuations in statistics  compile time optimization not possible
  • Global optimization not practical; as huge query networks  Adaptive optimization.
  • Other cost models taking memory into account, not throughput but output rate, etc.
  • Query optimization and load shedding
query optimization
Query Optimization

Compile-time, Global Optimization Infeasible

Too Many Boxes

Too Much Volatility in Network, Data

Dynamic, Local Optimization

Threshold re when to optimize

motivation of query migration
Motivation of ‘Query Migration’
  • Continuous query over streams
    • Statistics unknown before start
    • Statistics changing during execution
      • Stream rates, arrival pattern, distribution, etc
  • Need for dynamic adaptation
    • Plan re-optimization
      • Change the shape of query plan tree
run time plan re optimization
Run-time Plan Re-Optimization
  • Step 1 - Decide when to optimize
    • Statistics Monitoring
  • Step 2 – Generate new query plan
    • Query Optimization
  • Step 3 – Replace current plan by new plan
    • Plan Migration
adaptivity in query optimization
Adaptivity in Query Optimization

Dynamic Optimization : Migration

1. Identify Subnetwork

2. Buffer Inputs

3. Drain Subnetwork

4. Optimize Subnetwork

5. Turn on Taps

stateful operator in cq
Stateful Operator in CQ

Example: Symmetric NL join w/ window constraints

  • Why stateful
    • Need non-blocking operators in CQ
    • Operator needs to output partial results
    • State data structure keep received tuples

ax

b2

ax

b3

State A

State B

Key Observation:

The purge of tuples in

states relies on processing

of new tuples.

AB

b1

b2

b3

b4

b5

ax

A

B

ax

na ve migration strategy revisited
Naïve Migration Strategy Revisited

BC

AB

Deadlock Waiting Problem:

  • Steps

(1) Pause execution of old plan

(2) Drain out alltuples inside old plan

(3) Replace old plan by new plan

(4) Resume execution of new plan

A

B

C

(2)

All tuples

drained

(3)

Old Replaced

By new

(4)

Processing

Resumed

adaptivity query optimization
AdaptivityQuery Optimization

State Movement Protocol

Parallel Track Protocol

moving state strategy
Moving State Strategy
  • Basic idea
    • Share common states between two migration boxes
  • Key steps
    • State Matching
      • Match states based on IDs.
    • State Moving
      • Create new pointers for matched states in new box
    • What’s left?
      • Unmatched states in new box

QABCD

QABCD

CD

AB

SABC

SBCD

SD

SA

CD

BC

SD

SBC

SAB

SC

BC

AB

SB

SC

SA

SB

QA

QB

QC

QD

QA

QB

QC

QD

Old Box

New Box

parallel track strategy
Parallel Track Strategy
  • Basic idea
    • Execute both plans in parallel and gradually “push” old tuples out of old box by purging
  • Key steps
    • Connect boxes
    • Execute in parallel
      • Until old box “expired” (no old tuple or sub-tuple)
    • Disconnect old box
    • Start execute new box only

QABCD

QABCD

SABC

SD

SBCD

SA

CD

AB

SBC

SAB

SD

SC

BC

CD

SA

SB

SB

SC

BC

AB

QA

QB

QC

QD

QD

QA

QB

QC

adaptivity load shedding
AdaptivityLoad Shedding

1. Two Load Shedding Techniques:

  • Random Tuple Drops

Add DROP box to network(DROP a special case of FILTER)

Position to affect queries w/ tolerant delivery-based QoS reqts

  • Semantic Load Shedding

FILTER values with low utility (acc to value-based QoS)

2. Triggered by QoS Monitor

e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS

adaptivity detecting overload

Output rate = min (1/c, r) * s

Monitor each application’s Delay-based QoS

I

I

I

I

I

I

I

I

I

O

O

O

O

O

O

O

O

O

C,S

C,S

C,S

C,S

C,S

C,S

C,S

C,S

C,S

Problem: Too many apps in “bad zone”

P

P

P

P

P

P

P

P

P

AdaptivityDetecting Overload

Throughput Analysis

Cost = c

Selectivity = s

Input rate = r

1/c > r Þ Problem

Latency Analysis

conclusions
Conclusions

Aurora Stream Query Processing System

  • Designed for Scalability
  • QoS-Driven Resource Management
  • Continuous and Historical Queries
  • Stream Storage Management
  • Implemented Prototype

Web site: www.cs.brown.edu/research/aurora/