Rrapid real time recovery based on active probing introspection and decentralization
This presentation is the property of its rightful owner.
Sponsored Links
1 / 10

RRAPID : Real-time Recovery based on Active Probing, Introspection, and Decentralization PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

RRAPID : Real-time Recovery based on Active Probing, Introspection, and Decentralization. Takashi Suzuki Matthew Caesar. Motivation. Today’s internet core has bursty losses Backbones have low average loss rates (<0.2%), but experience large bursts in loss

Download Presentation

RRAPID : Real-time Recovery based on Active Probing, Introspection, and Decentralization

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Rrapid real time recovery based on active probing introspection and decentralization

RRAPID: Real-time Recovery based on Active Probing, Introspection, and Decentralization

Takashi Suzuki

Matthew Caesar


Motivation

Motivation

  • Today’s internet core has bursty losses

    • Backbones have low average loss rates (<0.2%), but experience large bursts in loss

    • Loss durations vary from 10ms to 33.72sec

    • 6 out of 7 providers experienced large outage periods 10-220sec for 1-2 times per day

  • Difficult for multimedia applications to recover from repeated loss (e.g. with FEC)

  • Commonly used restoration techniques insufficient

    • Link layer recovery, MPLS not yet uniformly deployed

    • RON too slow (20 sec), not scalable

  •  real-time recovery desired

  • “Assessment of VoIP Quality over Internet Backbones,” Markopoulou, Tobagi, Karam (INFOCOM 2002)


Approach

Approach

  • RRAPID:Real-time Recovery based on Adaptive Probing, Introspection, andDampening

  • Technique: Overlay based, real-time recovery

    • Use Link-state routing

    • Determine link cost from packet receipt delay

    • Adaptively dampen route advertisements

  • Desirable properties:

    • Speed: Low end-to-end failure time

    • Stability: Few route oscillations

    • Accuracy: Avoid reacting to transient failures

    • Scalability: Low probing/communication overhead


Rrapid real time recovery based on active probing introspection and decentralization

RS

System Architecture: Reaction Mechanism

  • Route Stabilization (RS):

    • Dampens route flaps

  • Adaptive Tracking (AT):

    • Filters noise

    • Reacts quickly to changes

  • Link Cost Estimation (LCE):

    • Estimates failure probability from packet loss

    • “Delay-deficit algorithm”

AT

LCE


Simulation results layered control

--- LCE output

---AT output

---RS output

Simulation Results: Layered Control

  • Show detailed actions of layers

    • --- LCE output: metric representing probability link has failed

    • ---AT output: metric with noise filtered

    • ---RS output: advertised value for link

    • Red spikes result from back-to-back packet losses

  • Setup

    • Link Failure at t=[150s-170s]

    • Probe every 300ms, 10% loss

  • Results

    • First Detection in 0.92s, next at 5.42

    • Several false positives due to cold start. Stabilizes in 100s.

    • 0.92s corresponds to 3 lost probes plus propagation delay of 0.02s


Simulation results reaction speed

Simulation Results: Reaction Speed

  • Reaction Speed

    • Probing faster improves speed

    • Probing every <400ms can give ~1s reaction times

    • Loss decreases reaction time

  • Overhead

    • Probing every >50ms gives reasonable overhead

  • Effect of packet loss

    • Increasing packet loss decreases accuracy

    • Advertisements and probes are dropped

    • Subsecond reactions even at 5% loss


Simulation results comparison

Simulation Results: Comparison

  • Compared RRAPID, RON, and “Oracle-based” routing.

  • Results:

    • RON requires 4 to 10x more advertisements than RRAPID

    • RON’s overhead increases exponentially with probe speed, RRAPID’s overhead increases linearly

    • Packet loss has an extreme effect on RON, moderate effect on RRAPID


Emulation results real internet workload

Emulation Results: Real Internet Workload

Overlay path 1

  • Method

    • Measured performance on real Internet workload

    • Traces acquired between UIUC and Stanford

    • Emulated 2-path overlay topology, one trace for each path

    • 1 natural failure at time t=[123.4s to 133.7s], introduced two failures from t=[40s to 50s] and t=[60s to 70s]

  • Result

    • Stable, sub-second reactions

Overlay path 2

--- Number of flows on link #1

---Number of flows on link #2


Analysis

Analysis

  • Simplified model of system

    • Modeled RS layer as MIAD

      • Increase by 1, Decrease by 1/k

      • Advertisement threshold limited to n

    • Ignored AT layer effects

  •  n*k state Markov chain

  • Given:

    • Probe loss probability p

    • Number of paths N

    • Probe interval I

  • We can determine:

    • Speed: Average reaction time

    • Overhead: Average advertisement rate

  • Found best-case expected Overhead and Reaction time for variable transient loss rates.

  • Results

    • Can react quickly, stably for fairly large amounts of transient packet loss

    • Overhead and reaction time increases super-linearly with loss rate


Conclusions

Conclusions

  • Can achieve sub-second reactions on most links with reasonable stability

    • Congested links increase reaction time

    • Can react well on most internet links

  • Trade off relationship between overhead and reaction speed

  • Lossy links worsen reaction time

    • Hard to react quickly, stably if all paths have >10% loss.

  • Future work:

    • Improve scalability with route aggregation

    • Extend evaluation of system parameters

    • Consider wider range of topologies, cross traffic, offered loads


  • Login