software defined measurement
Download
Skip this Video
Download Presentation
Software-defined Measurement

Loading in 2 Seconds...

play fullscreen
1 / 43

Software-defined Measurement - PowerPoint PPT Presentation


  • 223 Views
  • Uploaded on

Software-defined Measurement. Minlan Yu University of Southern California. Joint work with Lavanya Jose, Rui Miao, Masoud Moshref , Ramesh Govindan , Amin Vahdat. Management = Measurement + Control . Accounting Count resource usage for tenants Traffic engineering

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Software-defined Measurement' - qabil


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
software defined measurement

Software-defined Measurement

Minlan Yu

University of Southern California

Joint work with LavanyaJose, Rui Miao, MasoudMoshref, Ramesh Govindan, Amin Vahdat

management measurement control
Management = Measurement + Control
  • Accounting
    • Count resource usage for tenants
  • Traffic engineering
    • Identify large traffic aggregates, traffic changes
    • Understand flow characteristics (flow size, etc.)
  • Performance diagnosis
    • Why my application has high delay,

low throughput?

yet measurement is underexplored
Yet, measurement is underexplored
  • Measurement is an afterthought in network device
    • Control functions are optimized w/ many resources
    • Limited, fixed measurement support with NetFlow/sFlow
  • Traffic analysis is incomplete and indirect
    • Incomplete: May not catch all the events from samples
    • Indirect: Offline analysis based on pre-collected logs
  • Network-wide view of traffic is especially difficult
    • Data are collected at different times/places
software defined measurement1
Software-defined Measurement

Controller

Heavy Hitter detection

Change detection

1

2

1

Configure resources

Fetch statistics

(Re)Configure resources

  • SDN offers unique opportunities for measurement
    • Simple, reusable primitives at switches
    • Diverse and dynamic analysis at controller
    • Network-wide view
challenges
Challenges
  • Diverse measurement tasks
    • Generic measurement primitives for diverse tasks
    • Measurement library for easy programming
  • Limited resources at switches
    • New data structures to reduce memory usage
    • Multiplexing across many tasks
software defined measurement2
Software-defined Measurement
  • OpenSketch
  • (NSDI’13)
  • DREAM
  • (SIGCOMM’14)
  • Sketch-based
  • commodity switch components
  • Flow-based OpenFlow TCAM
  • Data plane
  • Primitives
  • Optimization w/ Provable resource-accuracy bounds
  • Dynamic Allocation w/ Accuracy estimator

Resource alloc across tasks

  • OpenSource
  • NetFPGA + Sketch library
  • networks of hardware switches and Open vSwitch

Prototype

software defined networking
Software Defined Networking

Controller

Configure devices and

collect measurements

API to the data plane (OpenFlow)

Fields action counters

Src=1.2.3.4drop, #packets, #bytes

Rethink the abstractions for measurement

Switches

Forward/measure packets

tradeoff of generality and efficiency
Tradeoff of Generality and Efficiency
  • Generality
    • Supporting a wide variety of measurement tasks
    • Who’s sending a lot to 23.43.0.0/16?
    • Is someone being DDoS-ed?
    • How many people downloaded files from 10.0.2.1?
  • Efficiency
    • Enabling high link speed (40 Gbps or larger)
    • Ensuring low cost (Cheap switches with small memory)
    • Easy to implement with commodity switch components
netflow general not efficient
NetFlow: General, Not Efficient
  • Cisco NetFlow/sFlow
    • Log sampled packets, or flow-level counters
  • General
    • Ok for many measurement tasks
    • Not ideal for any single task
  • Not efficient
    • It’s hard to determine the right sampling rate
    • Measurement accuracy depends on traffic distribution
    • Turned off or not even available in datacenters
streaming algo efficient not general
Streaming Algo: Efficient, Not General

Data plane

Control plane

Query: 23.43.12.1

3

0

5

1

9

Hash1

# bytes from 23.43.12.1

5

3

4

0

1

9

3

0

Hash2

Hash3

1

2

0

3

4

Pick min: 3

  • Streaming algorithms
    • Summarize packet information with Sketches
    • E.g. Count-Min Sketch, Who’s sending a lot to host A?
  • Not general:Each algorithm solves just one question
    • Require customized hardware or network processors
    • Hard to implement every solution in practice
where is the sweet spot
Where is the Sweet Spot?

General

Efficient

NetFlow/sFlow

(too expensive)

Streaming Algo

(Not practical)

  • OpenSketch
  • General, and efficient data plane based on sketches
  • Modularized control plane with automatic configuration
flexible measurement data plane
Flexible Measurement Data Plane
  • Picking the packets to measure
    • Hashes to represent a compact set of flows
      • A set of blacklisting IPs
    • Classify flows with different resources/accuracy
      • Filter out traffic for 23.43.0.0/16
  • Storing and exporting the data
    • A table with flexible indexing
    • Complex indexing using hashes and classification
    • Diverse mappings between counters and flows
a three stage pipeline
A three-stage pipeline

3

0

5

1

9

Hash1

# bytes from 23.43.12.1

0

1

9

3

0

Hash2

Hash3

1

2

0

3

4

  • Hashing: A few hash functions on packet source
  • Classification: based on hash value or packets
  • Counting: Update a few counters with simple calc.
build on existing switch components
Build on Existing Switch Components
  • A few simple hash functions
    • 4-8 three-wise or five-wise independent hash functions
    • Leverage traffic diversity to approx. truly random func.
  • A few TCAM entries for classification
    • Match on both packets and hash values
    • Avoid matching on individual micro-flow entries
  • Flexible counters in SRAM
    • Many logical tables for different sketches
    • Different numbers and sizes of counters
    • Access counters by addresses
modularized measurement libarary
Modularized Measurement Libarary
  • A measurement library of sketches
    • Bitmap, Bloom filter, Count-Min Sketch, etc.
    • Easy to implement with the data plane pipeline
    • Support diverse measurement tasks
  • Implement Heavy Hitters with OpenSketch
    • Who’s sending a lot to 23.43.0.0/16?
    • count-min sketch to count volume of flows
    • reversible sketch to identify flows with heavy counts in the count-min sketch
resource management
Resource management
  • Automatic configuration within a task
    • Pick the right sketches for measurement tasks
    • Allocating resources across sketches
    • Based on provable resource-accuracy curves
  • Resource allocation across tasks
    • Operators simply specify relative importance of tasks
    • Minimizing weighted error using convex optimization
    • Decompose to optimization problem of individual tasks
evaluation
Evaluation
  • Prototype on NetFPGA
    • No effect on data plane throughput
    • Line speed measurement performance
  • Trace Driven Simulators
    • OpenSketch, NetFlow, and streaming algorithm
    • One-hour CAIDA packet traces on a backbone link
  • Tradeoff between generality and efficiency
    • How efficient is OpenSketch compared to NetFlow?
    • How accurate is OpenSketch compared to specific streaming algorithms?
heavy hitters false positives negatives
Heavy Hitters: false positives/negatives
  • Identify flows taking > 0.5% bandwidth

OpenSketchrequires less memory with higher accuracy

tradeoff efficiency for generality
Tradeoff Efficiency for Generality

In theory,

OpenSketch requires 6 times memory than complex streaming algorithm

opensketch conclusion
OpenSketch Conclusion
  • OpenSketch:
    • Bridging the gap between theory and practice
  • Leveraging good properties of sketches
    • Provable accuracy-memory tradeoff
  • Making sketches easy to implement and use
    • Generic support for different measurement tasks
    • Easy to implement with commodity switch hardware
    • Modularized library for easy programming
sdm challenges
SDM Challenges

Many Management tasks

Controller

Heavy Hitter detection

Change detection

Heavy Hitter detection

Heavy Hitter detection

H

Dynamic Resource Allocator

1

2

1

Configure resources

Fetch statistics

(Re)Configure resources

Limited resources (TCAM)

dynamic resource allocator
Dynamic Resource Allocator

Recall=

detected true HH/all

  • Diminishing return of resources
    • More resources make smaller accuracy gain
    • More resources find less significant outputs
    • Operators can accept an accuracy bound <100%
dynamic resource allocator1
Dynamic Resource Allocator

Recall=

detected true HH/all

  • Temporal and spatial resource multiplexing
    • Traffic varies over time and switches
    • Resource for an accuracy bound depends on Traffic
challenges1
Challenges
  • No ground truth of resource-accuracy
    • Hard to do traditional convex optimization
    • New ways to estimate accuracy on the fly
    • Adaptively increase/decrease resources accordingly
  • Spatial & temporal changes
    • Task and traffic dynamics
    • Coordinate multiple switches to keep a task accurate
    • Spatial and temporal resource adaptation
dynamic resource allocator2
Dynamic Resource Allocator

Controller

Heavy Hitter detection

Change detection

Heavy Hitter detection

Heavy Hitter detection

H

Estimated accuracy

Estimated accuracy

Allocated resource

Allocated resource

Dynamic Resource Allocator

  • Decompose the resource allocator to each switch
    • Each switch separately increase/decrease resources
    • When and how to change resources?
per switch resource allocator when
Per-switch Resource Allocator: When?

Controller

Detected HH: 14 out of 30

Global accuracy=47%

Heavy Hitter detection

Detected HH:5 out of 20

Local accuracy=25%

Detected HH:9 out of 10

Local accuracy=90%

A

B

  • When a task on a switch needs more resources?
    • Based on A’s accuracy (25%) is not enough
      • if bound is 40%, no need to increase A’s resources
    • Based on the global accuracy (47%) is not enough
      • if bound is 80%, increasing B’s resources is not helpful
    • Conclusion: when max(local, global) < accuracy bound
per switch resource allocator how
Per-Switch Resource Allocator: How?
  • How to adapt resources?
    • Take from rich tasks, give to poor tasks
  • How much resource to take/give?
    • Adaptive change step for fast convergence
    • Small steps close to bound, large steps otherwise
task implementation
Task Implementation

Controller

Heavy Hitter detection

Change detection

Heavy Hitter detection

Heavy Hitter detection

H

Estimated accuracy

Estimated accuracy

Allocated resource

Allocated resource

Dynamic Resource Allocator

1

1

2

(Re)Configure resources

Fetch statistics

Configure resources

flow based algorithms using tcam
Flow-based algorithms using TCAM

New

36

Current

***

26

10

0**

1**

12

14

5

5

00*

01*

10*

11*

111

001

011

101

5

7

12

2

0

5

2

3

010

110

000

100

  • Goal: Maximize accuracy given limited resources
  • A general resource-aware algorithm
    • Different tasks: e.g., HH, HHH, Change detection
    • Multiple switches: e.g., HHs from different switches
      • Assume: Each flow is seen at one switch (e.g., at sources)
divide merge at m ultiple switches
Divide & Merge at Multiple Switches

New: A:00*, B:00*,01*, C:01*

Current: A:0**, B:0**, C:0**

26

0**

{A,B,C}

{A,B}

{B,C}

12

14

00*

01*

  • Divide: Monitor children to increase accuracy
    • Requires more resources on a setof switches
      • Example: Needs an additional entry on switch B
  • Merge: Monitor parent to free resources
    • Each node keeps the switch set it frees after merge
    • Finding the least important prefixes to merge is the minimum set cover problem
accuracy estimation heavy hitter detection
Accuracy Estimation: Heavy Hitter Detection

76

***

26

50

0**

1**

12

14

15

35

00*

01*

10*

11*

Threshold=10

111

001

011

101

At level 2 missed <=2 HH

5

7

12

2

0

15

20

15

With size 26 missed <=2 HHs

010

110

000

100

  • Any monitored leaf with volume > threshold is a true HH
  • Recall:
    • Estimate missing HHs using volume and level of counter
dream overview
DREAM Overview
  • Task type (Heavy hitter, Hierarchical heavy hitter, Change detection)
  • Task specific parameters (HH threshold)
  • Packet header field (source IP)
  • Filter (srcIP=10/24, dstIP=10.2/16)
  • Accuracy bound (80%)

Prototype Implementation with DREAM algorithms on Floodlight and Open vSwitches

1) Instantiate task

2) Accept/Reject

5) Report

7) Allocate / Drop

Task object 1

Task object n

Resource Allocator

6) Estimate accuracy

DREAM

4) Fetch counters

SDN Controller

3) Configure counters

evaluation1
Evaluation
  • Evaluation Goals
    • How accurate are tasks in DREAM?
      • Satisfaction: Task lifetime fraction above given accuracy
    • How many more accurate tasks can DREAM support?
      • % of rejected/dropped tasks
    • How fast is the DREAM control loop?
  • Compare to
    • Equal: divide resources equally at each switch, no reject
    • Fixed: 1/nresources to each task, reject extra tasks
prototype results
Prototype Results

DREAM: High satisfaction for avg & 5th % of tasks

with low rejection

Mean

5th %

Equal: only keeps small tasks satisfied

Fixed: High rejection as over-provisions for small tasks

256 tasks (various task types) on 8 switches

prototype results1
Prototype Results

DREAM: High satisfaction for avg & 5th % of tasks at the expense of more rejection

Equal & Fixed: only keeps small tasks satisfied

control loop delay
Control Loop Delay

Allocation delay is negligible vs. other delays

Incremental saving lets reduce save delay

dream conclusion
DREAM Conclusion
  • Challenges with software-defined measurement
    • Diverseand dynamic measurement tasks
    • Limited resources at switches
  • Dynamic resource allocation across tasks
    • Accuracy estimators for TCAM-based algorithms
    • Spatial and temporal resource multiplexing
summary
Summary
  • Software-defined measurement
    • Measurement is important, yet underexplored
    • SDN brings new opportunities to measurement
    • Time to rebuild the entire measurement stack
  • Our work
    • OpenSketch:Generic, efficient measurement on sketches
    • DREAM: Dynamic resource allocation for many tasks
ad