scalable multi module switches with quality of service thesis defense l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Scalable Multi-module Switches with Quality of Service Thesis Defense PowerPoint Presentation
Download Presentation
Scalable Multi-module Switches with Quality of Service Thesis Defense

Loading in 2 Seconds...

play fullscreen
1 / 37

Scalable Multi-module Switches with Quality of Service Thesis Defense - PowerPoint PPT Presentation


  • 210 Views
  • Uploaded on

Scalable Multi-module Switches with Quality of Service Thesis Defense. Santosh Krishnan sk@cs.columbia.edu May 1, 2006 Advisor : Prof. Henning G. Schulzrinne Co-advisor : Dr. Fabio M. Chiussi. Outline. Problem Definition Motivations, list of contributions Switching Model: Components

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Scalable Multi-module Switches with Quality of Service Thesis Defense' - Antony


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
scalable multi module switches with quality of service thesis defense

Scalable Multi-module Switches with Quality of ServiceThesis Defense

Santosh Krishnan

sk@cs.columbia.edu

May 1, 2006

Advisor: Prof. Henning G. Schulzrinne

Co-advisor: Dr. Fabio M. Chiussi

outline
Outline
  • Problem Definition
    • Motivations, list of contributions
  • Switching Model: Components
  • Related work: Formal methods in switching
  • Buffered Clos Switches
    • Concept of functional equivalence
  • BCS: Throughput and Quality of Service
    • Single-path BCS: CIOQ, aggregation, pipelining
    • Multi-path BCS: Parallelization
  • Conclusions
problem definition
Problem Definition

Goals:

  • How to methodically construct a high-capacity switch?
  • How to design high-performance algorithms for such switches?

Importance:

  • Physical layer improvements: 10-G Ethernet, OC-768
  • Converged network requiring QoS: IPTV, MPLS VPN
  • Case for modular design: component reuse

What exists:

  • Ad-hoc approach to switch design
  • No benchmarks, varying performance satisfaction
    • Non-blocking, 100% throughput, nominal capacity
contributions
Contributions
  • Taxonomy of multi-module switches: Buffered Clos Switches
  • Performance framework: Functional equivalence with ideal switch

Mimics circuit-switching rigor

Applications

Combined I/O Queueing

Aggregation

  • QoS: Online maximal matching
  • Throughput: Critical matching
  • Strict stability: Maximal matching, SOQF
  • Switched Fair Airport matching
  • Shadow CIOQ and Decompose
  • Virtual Element Queueing

Pipelining

  • Striping and Equal Dispatch
  • Concurrent Dispatch: 3D matching

Parallelization

  • Flow-based PPS: Clos fitting
  • Cell-based PPS: Striping, Equal Dispatch

Memory Space Memory

  • Combination methods
  • Recursive BCS
switching model
Switching Model
  • Basic property: Contention
  • Flows: Guaranteed QoS, Best-effort
  • Ideal Switch: Provide bandwidth trunks, sustain link capacity
    • Black box for network engineering purposes

CPU

Slow Path

PPU

PPU

Switch

Fabric

Outputs

PPU

PPU

Inputs

PPU

PPU

Fast Path

switching model components
Switching Model: Components

Memory Element

Space Element

Buffers

Matching: 2D

Link Scheduling

Mesh

Conflict-free property

Matching complexity

Constraints:

Memory bandwidth

Full-mesh circuitry

Monolithic

OQ Switch: Ideal

IQ Switch

  • Architecture: Interconnect memory and space elements
  • Algorithms: Meaningfully emulate the ideal switch for throughput and QoS
background clos networks
Background: Clos Networks
  • Strictly non-blocking: K ≥ 2M – 1(Clos theorem)
  • Re-arrangeable:K ≥ M(Slepian-Duguid)

M

Outputs

Inputs- One circuit

Recognize:

  • Space-time duality
  • Fitting: matrix decomposition

K

Fitting Algorithms

Inspiration: Replace selected elements with memory

background cioq switches
Background: CIOQ Switches

Pro:

  • Low memory bandwidth

Con:

  • Complexity of matching:
  • Switch size
  • Frequency
  • Reconfiguration rate

Queue State

Configuration

0

0

1

3

0

5

  • Offline: Templates
  • Maximum, Maximal, Critical
  • Heuristics

1

0

0

7

0

1

0

1

0

0

5

0

What performance results when applied to a changing queue state?

background cioq switch results
Background: CIOQ Switch Results

Based on combinatorics and stability theory

QoS

(Weller-Hajek ‘97)

Throughput

Auxiliary Results: Envelope matching (Kar ‘00), Packet-mode matching (Marsan ‘02)

framework buffered clos switches
Framework: Buffered Clos Switches

Parallelize: Pool memory resources

PPS

Definition:

  • Switch size
  • Type of elements
  • Number in first stage
  • Number in second
  • Speedup

Aggregate: Smaller elements

CIOQ-A, G-MSM

Pipeline: Lower speed, complexity

CIOQ-P, G-MSM

  • Isomorphism: Non-blocking Clos network
  • Properties: Multi-stage, fully connected, symmetric, uniform
framework functional equivalence
Framework: Functional Equivalence

Characterize relative performance: Functional equivalence

f1: Allocate known rates

Shape: Bandwidth trunks

f2: Relative stability for admissible traffic

Literature: 100% throughput

f3: Per-output relative stability

Work conserving

f4: Strict relative stability: all pairs

f5: Exact emulation

  • Emulate an ideal switch: exact, asymptotic
  • Bandwidth trunks, independent throughput optimization
cioq bandwidth trunks
CIOQ: Bandwidth Trunks

Shaping plus online matching is sufficient for bandwidth guarantees

Offline

BVN Templates

Rate Matrix

Cons:

Template Storage

Centralized rate processing

Online

Weight Scheduler

Arbitrary Arrivals

Shape/Batch VOQ

Online: Maximal (s=2)

Online: Critical (s=1)

Split time into intervals: T = GCD (R)

Batch traffic in each interval: Simple counters

  • Extension of Weller-Hajek maximal matching theorem
  • Clos analogy: Maximal matching as a strategy for orderly assignments
cioq admissible traffic
CIOQ: Admissible Traffic

Best Throughput Results:

  • No speedup: MWM (McKeown et al.), Speedup 2: Maximal (Dai-Prabhakar)
  • Can a simple maximum size matching suffice for admissible traffic?

Red Herring!

Critical matching suffices for asymptotic 100% throughput (f2)

3

0

3

0

6

6

Augment

MSM

7

7

0

1

1

1

Queue State

Critical Matching

0

2

5

5

0

2

Intuition: 2x2 Line buckets

R1

R2

C1

C2

Max

cioq strict relative stability
CIOQ: Strict Relative Stability
  • Maximal matching: Keeps under-subscribed outputs stable (f3) (s=2)
  • Shortest Output-Queue First: (f4) (s=3)
    • Output element scheduler: Identical to the one in emulated switch
    • Intuition: Give preference to less congested pairs at the output
    • Asymptotic emulation of an ideal switch: long-term fairness
switched fair airport
Switched Fair Airport
  • Integrate two policies M1 and M2:
  • M1: Provides bandwidth trunks given rate reservations
  • M2: Optimize throughput independent of above rates

Multi-phase Combination

Exclusive Combination

Speedup Required:

M1

M2

Maximal matching is additive to any other policy, hence needs the least speedup

cioq a aggregation
CIOQ-A: Aggregation

Advantages:

Smaller space element

Lower arbitration complexity

Heterogeneous subports

  • Shadow-Decompose: CIOQ emulation (f5)
  • VEQ Matching: Less complex, only for admissible traffic (f2)
cioq p pipelining
CIOQ-P: Pipelining
  • Sequential Dispatch: CIOQ emulation (f5)
  • Concurrent Dispatch:
    • Limited candidates: stale-state issues
    • 3D Maximal Matching for relative stability
  • Striping: Shadow on envelope basis
  • Equal Dispatch:
    • Explicitly equalize load
    • Separate occupancy counters for each SE

Implement arbitrarily complex policies!

Advantages:

Slower space element

Lower arbitration complexity

g msm combination
G-MSM: Combination

Combination methods: CIOQ-A/P

No need for independent analysis

Recursion possible

pps architecture
PPS: Architecture

Core

Advantages:

Demux

Mux

Reuse low-capacity core switch

Implement arbitrarily slow memories!

provided

Memoryless first and third stages

Performance: Emulates OQ switch

  • Pool the resources on several switching paths
  • Dual of a CIOQ-P switch
    • Matching algorithm replaced by load balancing
    • Sequence control might be necessary
pps flow based
PPS: Flow-based
  • Model for clustered routers:
  • Per-flow path assignment: explicit or hashed
  • No need for sequence control
  • Memory in first stage
  • High speedup (Clos fitting)
    • Unbalanced load assignment
  • Requires knowledge of loads

Split flows

pps cell based
PPS: Cell-based
  • Uniformly distribute the load of each flow
  • Premise: Each core element receives 1/K cells of each flow
  • Equal dispatch and striping suffice for asymptotic OQ emulation
  • Bandwidth trunks: Large buffers required
summary a recipe book
Summary: A Recipe Book
  • Taxonomy of multi-module switches: Buffered Clos Switches
  • Performance framework: Functional equivalence with ideal switch

Applications

Combined I/O Queueing

Aggregation

  • QoS: Online maximal matching
  • Throughput: Critical matching
  • Strict stability: Maximal matching, SOQF
  • Switched Fair Airport matching
  • Shadow and Decompose
  • Virtual Element Queueing

Pipelining

  • Striping and Equal Dispatch
  • Concurrent Dispatch: 3D matching

Parallelization

  • Flow-based PPS: Clos fitting
  • Cell-based PPS: Striping, Equal Dispatch

Memory Space Memory

  • Combination methods
  • Recursive BCS
avenues for follow on research
Avenues for Follow-on Research
  • Efficient policies for multicast
  • Similar treatment on other interconnection networks
  • Theory of backpressure:
    • Recent interest in buffered crossbars
  • Quality of stability: Average delay analysis
  • Short-timescale equivalence
  • Emulation of a finite-memory ideal switch
    • Interplay of buffer management with matching algorithms
relevant publications
Relevant Publications
  • Dynamic Partitioning: Switch Memory Management, Infocom ’99
  • Packet Switches with QoS Support, Hot Interconnects ’00
  • Feedback Control for Distributed Scheduling, Globecomm ’00
  • Buffered Clos Switches, Columbia TR ’02
  • Inverse Multiplexing for Switches, Globecom ’98
  • Switched Connections Inverse Multiplexing, Intl. Conf. ATM ’99
  • Recognition of Parallel Packet Switches, GBN, Infocom ’01
  • Stability Analysis of Parallel Packet Switches, ICC ’01
  • Open-loop Schemes for Multi-path Switches, ICC ‘03

Switching

Algorithms

Parallel

Switches

proposal conjectures
Proposal Conjectures

Proposal: six conjectures

  • Maximal matching is sufficient to isolate oversubscribed outputs: DONE
  • SOQF is sufficient for strict relative stability: DONE
  • Equal dispatch for strict stability in CIOQ-P: DONE
  • Equal dispatch plus decomposition for strict stability in G-MSM: DONE
  • Rate shaping plus maximal matching suffices for QoS in CIOQ: DONE
  • SOQF suffices for long-term fairness in CIOQ: DONE

Plus many more to round out the work

additional contributions
Additional Contributions

Background: Survey of formal methods in switching– a new perspective

Applications

Combined I/O Queueing

Aggregation

  • Maximal Matching: Delay analysis
  • Perfect Sequences: Uniform Traffic
  • Multicast support using Recycling
  • Batch Decomposition (Optical)
  • Support for Heterogeneous Subports

Pipelining

Parallelization

  • Concurrent Dispatch: BVN and SPS
  • SMM Switches: PPS without backpressure
  • Fractional Dispatch for memoryless inputs
matching flavors
Matching Flavors
  • Maximal matching: Non-idling, greedy
  • Maximum-size matching: Maximum flow in a bipartite graph
    • Ford-Fulkerson, Hopcroft-Karp

Invariant:

3

0

6

At least one connection

in the marked lines

7

0

1

Queue State

Non-empty

0

5

0

matching flavors continued
Matching Flavors (continued)
  • Critical Matching: Covers all critical rows and columns
    • Critical line: A line with the maximum sum
  • Perfect Matching: Each configuration is a permutation
  • Maximum Weight Matching: Use queue length as weights
    • Optimization problem: simplex method
  • Template Matchings:
    • BVN: Decompose rate matrix as convex combination of permutations
    • Double: Lower number of permutations, wasted slots
    • Min: N permutations will cover all entries, large number of wasted slots
  • Stable Matching: Gale-Shapely algorithm
stability theory
Stability Theory
  • Lyapunov functions: Kumar-Meyn ‘95
    • Mechanism to extend Foster’s criterion to a system of queues
    • Weighted cartesian product of queue lengths
    • Symmetric and co-positive
  • Fluid limits: Dai-Prabhakar ‘00
    • Function of discrete time: Interpolate
    • Limit: Scale time to infinity
    • The scaling parameter may be drawn from an increasing sequence rn

F(t) = lim 1/r f(rt)

r∞

cioq bandwidth trunks31
CIOQ: Bandwidth Trunks

Arrivals into GQ:

Bounded admissible

Bandwidth Trunk:

Timescale = 1/GCD(R)

Covers all entries in

GQ before next batch

  • Delay comparable to BVN rate decomposition
cioq perfect sequences
CIOQ: Perfect Sequences
  • Sub-maximal Perfect Sequence:
    • A sequence of N permutations that covers the unit matrix
    • A repeating sequence guarantees 1/N to each pair
    • Suffices for 100% throughput to uniform traffic
  • Simple implementation: Staggered round-robin
    • Not even maximal!

Concurrent SPS for CIOQ-P:

K turns in KN slots

Basis for iSLIP

Basis for Atlanta arbitration

cioq p equal dispatch
CIOQ-P: Equal Dispatch

Explicitly equalize the load for each input-output pair

Implemented as counters

No mis-sequencing issues

cioq p 3d maximal matching
CIOQ-P: 3D Maximal Matching

Concurrent traversal of queue state matrix

Pointers do not coincide with each other

recursive g msm
Recursive G-MSM

Any matching

SPS

SPS

Memory element of a G-MSM:

Replace with a CIOQ switch

Virtual Element Queues

Organized per space element