1 / 10

RRAPID : Real-time Recovery based on Active Probing, Introspection, and Decentralization

RRAPID : Real-time Recovery based on Active Probing, Introspection, and Decentralization. Takashi Suzuki Matthew Caesar. Motivation. Today’s internet core has bursty losses Backbones have low average loss rates (<0.2%), but experience large bursts in loss

orien
Download Presentation

RRAPID : Real-time Recovery based on Active Probing, Introspection, and Decentralization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RRAPID: Real-time Recovery based on Active Probing, Introspection, and Decentralization Takashi Suzuki Matthew Caesar

  2. Motivation • Today’s internet core has bursty losses • Backbones have low average loss rates (<0.2%), but experience large bursts in loss • Loss durations vary from 10ms to 33.72sec • 6 out of 7 providers experienced large outage periods 10-220sec for 1-2 times per day • Difficult for multimedia applications to recover from repeated loss (e.g. with FEC) • Commonly used restoration techniques insufficient • Link layer recovery, MPLS not yet uniformly deployed • RON too slow (20 sec), not scalable •  real-time recovery desired • “Assessment of VoIP Quality over Internet Backbones,” Markopoulou, Tobagi, Karam (INFOCOM 2002)

  3. Approach • RRAPID:Real-time Recovery based on Adaptive Probing, Introspection, andDampening • Technique: Overlay based, real-time recovery • Use Link-state routing • Determine link cost from packet receipt delay • Adaptively dampen route advertisements • Desirable properties: • Speed: Low end-to-end failure time • Stability: Few route oscillations • Accuracy: Avoid reacting to transient failures • Scalability: Low probing/communication overhead

  4. RS System Architecture: Reaction Mechanism • Route Stabilization (RS): • Dampens route flaps • Adaptive Tracking (AT): • Filters noise • Reacts quickly to changes • Link Cost Estimation (LCE): • Estimates failure probability from packet loss • “Delay-deficit algorithm” AT LCE

  5. --- LCE output ---AT output ---RS output Simulation Results: Layered Control • Show detailed actions of layers • --- LCE output: metric representing probability link has failed • ---AT output: metric with noise filtered • ---RS output: advertised value for link • Red spikes result from back-to-back packet losses • Setup • Link Failure at t=[150s-170s] • Probe every 300ms, 10% loss • Results • First Detection in 0.92s, next at 5.42 • Several false positives due to cold start. Stabilizes in 100s. • 0.92s corresponds to 3 lost probes plus propagation delay of 0.02s

  6. Simulation Results: Reaction Speed • Reaction Speed • Probing faster improves speed • Probing every <400ms can give ~1s reaction times • Loss decreases reaction time • Overhead • Probing every >50ms gives reasonable overhead • Effect of packet loss • Increasing packet loss decreases accuracy • Advertisements and probes are dropped • Subsecond reactions even at 5% loss

  7. Simulation Results: Comparison • Compared RRAPID, RON, and “Oracle-based” routing. • Results: • RON requires 4 to 10x more advertisements than RRAPID • RON’s overhead increases exponentially with probe speed, RRAPID’s overhead increases linearly • Packet loss has an extreme effect on RON, moderate effect on RRAPID

  8. Emulation Results: Real Internet Workload Overlay path 1 • Method • Measured performance on real Internet workload • Traces acquired between UIUC and Stanford • Emulated 2-path overlay topology, one trace for each path • 1 natural failure at time t=[123.4s to 133.7s], introduced two failures from t=[40s to 50s] and t=[60s to 70s] • Result • Stable, sub-second reactions Overlay path 2 --- Number of flows on link #1 ---Number of flows on link #2

  9. Analysis • Simplified model of system • Modeled RS layer as MIAD • Increase by 1, Decrease by 1/k • Advertisement threshold limited to n • Ignored AT layer effects •  n*k state Markov chain • Given: • Probe loss probability p • Number of paths N • Probe interval I • We can determine: • Speed: Average reaction time • Overhead: Average advertisement rate • Found best-case expected Overhead and Reaction time for variable transient loss rates. • Results • Can react quickly, stably for fairly large amounts of transient packet loss • Overhead and reaction time increases super-linearly with loss rate

  10. Conclusions • Can achieve sub-second reactions on most links with reasonable stability • Congested links increase reaction time • Can react well on most internet links • Trade off relationship between overhead and reaction speed • Lossy links worsen reaction time • Hard to react quickly, stably if all paths have >10% loss. • Future work: • Improve scalability with route aggregation • Extend evaluation of system parameters • Consider wider range of topologies, cross traffic, offered loads

More Related