Software defined measurement
This presentation is the property of its rightful owner.
Sponsored Links
1 / 43

Software-defined Measurement PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on
  • Presentation posted in: General

Software-defined Measurement. Minlan Yu University of Southern California. Joint work with Lavanya Jose, Rui Miao, Masoud Moshref , Ramesh Govindan , Amin Vahdat. Management = Measurement + Control . Accounting Count resource usage for tenants Traffic engineering

Download Presentation

Software-defined Measurement

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Software defined measurement

Software-defined Measurement

Minlan Yu

University of Southern California

Joint work with LavanyaJose, Rui Miao, MasoudMoshref, Ramesh Govindan, Amin Vahdat


Management measurement control

Management = Measurement + Control

  • Accounting

    • Count resource usage for tenants

  • Traffic engineering

    • Identify large traffic aggregates, traffic changes

    • Understand flow characteristics (flow size, etc.)

  • Performance diagnosis

    • Why my application has high delay,

      low throughput?


Yet measurement is underexplored

Yet, measurement is underexplored

  • Measurement is an afterthought in network device

    • Control functions are optimized w/ many resources

    • Limited, fixed measurement support with NetFlow/sFlow

  • Traffic analysis is incomplete and indirect

    • Incomplete: May not catch all the events from samples

    • Indirect: Offline analysis based on pre-collected logs

  • Network-wide view of traffic is especially difficult

    • Data are collected at different times/places


Software defined measurement1

Software-defined Measurement

Controller

Heavy Hitter detection

Change detection

1

2

1

Configure resources

Fetch statistics

(Re)Configure resources

  • SDN offers unique opportunities for measurement

    • Simple, reusable primitives at switches

    • Diverse and dynamic analysis at controller

    • Network-wide view


Challenges

Challenges

  • Diverse measurement tasks

    • Generic measurement primitives for diverse tasks

    • Measurement library for easy programming

  • Limited resources at switches

    • New data structures to reduce memory usage

    • Multiplexing across many tasks


Software defined measurement2

Software-defined Measurement

  • OpenSketch

  • (NSDI’13)

  • DREAM

  • (SIGCOMM’14)

  • Sketch-based

  • commodity switch components

  • Flow-based OpenFlow TCAM

  • Data plane

  • Primitives

  • Optimization w/ Provable resource-accuracy bounds

  • Dynamic Allocation w/ Accuracy estimator

Resource alloc across tasks

  • OpenSource

  • NetFPGA + Sketch library

  • networks of hardware switches and Open vSwitch

Prototype


Software defined measurement with sketches nsdi 13

Software-defined Measurement with Sketches(NSDI’13)


Software defined networking

Software Defined Networking

Controller

Configure devices and

collect measurements

API to the data plane (OpenFlow)

Fields action counters

Src=1.2.3.4drop, #packets, #bytes

Rethink the abstractions for measurement

Switches

Forward/measure packets


Tradeoff of generality and efficiency

Tradeoff of Generality and Efficiency

  • Generality

    • Supporting a wide variety of measurement tasks

    • Who’s sending a lot to 23.43.0.0/16?

    • Is someone being DDoS-ed?

    • How many people downloaded files from 10.0.2.1?

  • Efficiency

    • Enabling high link speed (40 Gbps or larger)

    • Ensuring low cost (Cheap switches with small memory)

    • Easy to implement with commodity switch components


Netflow general not efficient

NetFlow: General, Not Efficient

  • Cisco NetFlow/sFlow

    • Log sampled packets, or flow-level counters

  • General

    • Ok for many measurement tasks

    • Not ideal for any single task

  • Not efficient

    • It’s hard to determine the right sampling rate

    • Measurement accuracy depends on traffic distribution

    • Turned off or not even available in datacenters


Streaming algo efficient not general

Streaming Algo: Efficient, Not General

Data plane

Control plane

Query: 23.43.12.1

3

0

5

1

9

Hash1

# bytes from 23.43.12.1

5

3

4

0

1

9

3

0

Hash2

Hash3

1

2

0

3

4

Pick min: 3

  • Streaming algorithms

    • Summarize packet information with Sketches

    • E.g. Count-Min Sketch, Who’s sending a lot to host A?

  • Not general:Each algorithm solves just one question

    • Require customized hardware or network processors

    • Hard to implement every solution in practice


Where is the sweet spot

Where is the Sweet Spot?

General

Efficient

NetFlow/sFlow

(too expensive)

Streaming Algo

(Not practical)

  • OpenSketch

  • General, and efficient data plane based on sketches

  • Modularized control plane with automatic configuration


Flexible measurement data plane

Flexible Measurement Data Plane

  • Picking the packets to measure

    • Hashes to represent a compact set of flows

      • A set of blacklisting IPs

    • Classify flows with different resources/accuracy

      • Filter out traffic for 23.43.0.0/16

  • Storing and exporting the data

    • A table with flexible indexing

    • Complex indexing using hashes and classification

    • Diverse mappings between counters and flows


A three stage pipeline

A three-stage pipeline

3

0

5

1

9

Hash1

# bytes from 23.43.12.1

0

1

9

3

0

Hash2

Hash3

1

2

0

3

4

  • Hashing: A few hash functions on packet source

  • Classification: based on hash value or packets

  • Counting: Update a few counters with simple calc.


Build on existing switch components

Build on Existing Switch Components

  • A few simple hash functions

    • 4-8 three-wise or five-wise independent hash functions

    • Leverage traffic diversity to approx. truly random func.

  • A few TCAM entries for classification

    • Match on both packets and hash values

    • Avoid matching on individual micro-flow entries

  • Flexible counters in SRAM

    • Many logical tables for different sketches

    • Different numbers and sizes of counters

    • Access counters by addresses


Modularized measurement libarary

Modularized Measurement Libarary

  • A measurement library of sketches

    • Bitmap, Bloom filter, Count-Min Sketch, etc.

    • Easy to implement with the data plane pipeline

    • Support diverse measurement tasks

  • Implement Heavy Hitters with OpenSketch

    • Who’s sending a lot to 23.43.0.0/16?

    • count-min sketch to count volume of flows

    • reversible sketch to identify flows with heavy counts in the count-min sketch


Support many measurement tasks

Support Many Measurement Tasks


Resource management

Resource management

  • Automatic configuration within a task

    • Pick the right sketches for measurement tasks

    • Allocating resources across sketches

    • Based on provable resource-accuracy curves

  • Resource allocation across tasks

    • Operators simply specify relative importance of tasks

    • Minimizing weighted error using convex optimization

    • Decompose to optimization problem of individual tasks


Opensketch architecture

OpenSketch Architecture


Evaluation

Evaluation

  • Prototype on NetFPGA

    • No effect on data plane throughput

    • Line speed measurement performance

  • Trace Driven Simulators

    • OpenSketch, NetFlow, and streaming algorithm

    • One-hour CAIDA packet traces on a backbone link

  • Tradeoff between generality and efficiency

    • How efficient is OpenSketch compared to NetFlow?

    • How accurate is OpenSketch compared to specific streaming algorithms?


Heavy hitters false positives negatives

Heavy Hitters: false positives/negatives

  • Identify flows taking > 0.5% bandwidth

OpenSketchrequires less memory with higher accuracy


Tradeoff efficiency for generality

Tradeoff Efficiency for Generality

In theory,

OpenSketch requires 6 times memory than complex streaming algorithm


Opensketch conclusion

OpenSketch Conclusion

  • OpenSketch:

    • Bridging the gap between theory and practice

  • Leveraging good properties of sketches

    • Provable accuracy-memory tradeoff

  • Making sketches easy to implement and use

    • Generic support for different measurement tasks

    • Easy to implement with commodity switch hardware

    • Modularized library for easy programming


Dynamic resource allocation for tcam based measurement sigcomm 14

Dynamic Resource AllocationFor TCAM-based MeasurementSIGCOMM’14


Sdm challenges

SDM Challenges

Many Management tasks

Controller

Heavy Hitter detection

Change detection

Heavy Hitter detection

Heavy Hitter detection

H

Dynamic Resource Allocator

1

2

1

Configure resources

Fetch statistics

(Re)Configure resources

Limited resources (TCAM)


Dynamic resource allocator

Dynamic Resource Allocator

Recall=

detected true HH/all

  • Diminishing return of resources

    • More resources make smaller accuracy gain

    • More resources find less significant outputs

    • Operators can accept an accuracy bound <100%


Dynamic resource allocator1

Dynamic Resource Allocator

Recall=

detected true HH/all

  • Temporal and spatial resource multiplexing

    • Traffic varies over time and switches

    • Resource for an accuracy bound depends on Traffic


Challenges1

Challenges

  • No ground truth of resource-accuracy

    • Hard to do traditional convex optimization

    • New ways to estimate accuracy on the fly

    • Adaptively increase/decrease resources accordingly

  • Spatial & temporal changes

    • Task and traffic dynamics

    • Coordinate multiple switches to keep a task accurate

    • Spatial and temporal resource adaptation


Dynamic resource allocator2

Dynamic Resource Allocator

Controller

Heavy Hitter detection

Change detection

Heavy Hitter detection

Heavy Hitter detection

H

Estimated accuracy

Estimated accuracy

Allocated resource

Allocated resource

Dynamic Resource Allocator

  • Decompose the resource allocator to each switch

    • Each switch separately increase/decrease resources

    • When and how to change resources?


Per switch resource allocator when

Per-switch Resource Allocator: When?

Controller

Detected HH: 14 out of 30

Global accuracy=47%

Heavy Hitter detection

Detected HH:5 out of 20

Local accuracy=25%

Detected HH:9 out of 10

Local accuracy=90%

A

B

  • When a task on a switch needs more resources?

    • Based on A’s accuracy (25%) is not enough

      • if bound is 40%, no need to increase A’s resources

    • Based on the global accuracy (47%) is not enough

      • if bound is 80%, increasing B’s resources is not helpful

    • Conclusion: when max(local, global) < accuracy bound


Per switch resource allocator how

Per-Switch Resource Allocator: How?

  • How to adapt resources?

    • Take from rich tasks, give to poor tasks

  • How much resource to take/give?

    • Adaptive change step for fast convergence

    • Small steps close to bound, large steps otherwise


Task implementation

Task Implementation

Controller

Heavy Hitter detection

Change detection

Heavy Hitter detection

Heavy Hitter detection

H

Estimated accuracy

Estimated accuracy

Allocated resource

Allocated resource

Dynamic Resource Allocator

1

1

2

(Re)Configure resources

Fetch statistics

Configure resources


Flow based algorithms using tcam

Flow-based algorithms using TCAM

New

36

Current

***

26

10

0**

1**

12

14

5

5

00*

01*

10*

11*

111

001

011

101

5

7

12

2

0

5

2

3

010

110

000

100

  • Goal: Maximize accuracy given limited resources

  • A general resource-aware algorithm

    • Different tasks: e.g., HH, HHH, Change detection

    • Multiple switches: e.g., HHs from different switches

      • Assume: Each flow is seen at one switch (e.g., at sources)


Divide merge at m ultiple switches

Divide & Merge at Multiple Switches

New: A:00*, B:00*,01*, C:01*

Current: A:0**, B:0**, C:0**

26

0**

{A,B,C}

{A,B}

{B,C}

12

14

00*

01*

  • Divide: Monitor children to increase accuracy

    • Requires more resources on a setof switches

      • Example: Needs an additional entry on switch B

  • Merge: Monitor parent to free resources

    • Each node keeps the switch set it frees after merge

    • Finding the least important prefixes to merge is the minimum set cover problem


Accuracy estimation heavy hitter detection

Accuracy Estimation: Heavy Hitter Detection

76

***

26

50

0**

1**

12

14

15

35

00*

01*

10*

11*

Threshold=10

111

001

011

101

At level 2 missed <=2 HH

5

7

12

2

0

15

20

15

With size 26 missed <=2 HHs

010

110

000

100

  • Any monitored leaf with volume > threshold is a true HH

  • Recall:

    • Estimate missing HHs using volume and level of counter


Dream overview

DREAM Overview

  • Task type (Heavy hitter, Hierarchical heavy hitter, Change detection)

  • Task specific parameters (HH threshold)

  • Packet header field (source IP)

  • Filter (srcIP=10/24, dstIP=10.2/16)

  • Accuracy bound (80%)

Prototype Implementation with DREAM algorithms on Floodlight and Open vSwitches

1) Instantiate task

2) Accept/Reject

5) Report

7) Allocate / Drop

Task object 1

Task object n

Resource Allocator

6) Estimate accuracy

DREAM

4) Fetch counters

SDN Controller

3) Configure counters


Evaluation1

Evaluation

  • Evaluation Goals

    • How accurate are tasks in DREAM?

      • Satisfaction: Task lifetime fraction above given accuracy

    • How many more accurate tasks can DREAM support?

      • % of rejected/dropped tasks

    • How fast is the DREAM control loop?

  • Compare to

    • Equal: divide resources equally at each switch, no reject

    • Fixed: 1/nresources to each task, reject extra tasks


Prototype results

Prototype Results

DREAM: High satisfaction for avg & 5th % of tasks

with low rejection

Mean

5th %

Equal: only keeps small tasks satisfied

Fixed: High rejection as over-provisions for small tasks

256 tasks (various task types) on 8 switches


Prototype results1

Prototype Results

DREAM: High satisfaction for avg & 5th % of tasks at the expense of more rejection

Equal & Fixed: only keeps small tasks satisfied


Control loop delay

Control Loop Delay

Allocation delay is negligible vs. other delays

Incremental saving lets reduce save delay


Dream conclusion

DREAM Conclusion

  • Challenges with software-defined measurement

    • Diverseand dynamic measurement tasks

    • Limited resources at switches

  • Dynamic resource allocation across tasks

    • Accuracy estimators for TCAM-based algorithms

    • Spatial and temporal resource multiplexing


Summary

Summary

  • Software-defined measurement

    • Measurement is important, yet underexplored

    • SDN brings new opportunities to measurement

    • Time to rebuild the entire measurement stack

  • Our work

    • OpenSketch:Generic, efficient measurement on sketches

    • DREAM: Dynamic resource allocation for many tasks


Thanks

Thanks!


  • Login