enabling flow level latency measurements across routers in data centers n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Enabling Flow-level Latency Measurements across Routers in Data Centers PowerPoint Presentation
Download Presentation
Enabling Flow-level Latency Measurements across Routers in Data Centers

Loading in 2 Seconds...

play fullscreen
1 / 15

Enabling Flow-level Latency Measurements across Routers in Data Centers - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

Enabling Flow-level Latency Measurements across Routers in Data Centers. Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella. Latency-critical applications in data centers. Guaranteeing low end-to-end latency is important Web search (e.g., Google’s instant search service)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Enabling Flow-level Latency Measurements across Routers in Data Centers' - mikasi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
enabling flow level latency measurements across routers in data centers

Enabling Flow-level Latency Measurements across Routers in Data Centers

Parmjeet Singh, MyungjinLee

SagarKumar, RamanaRaoKompella

latency critical applications in data centers
Latency-critical applications in data centers
  • Guaranteeing low end-to-end latency is important
    • Web search (e.g., Google’s instant search service)
    • Retail advertising
    • Recommendation systems
    • High-frequency trading in financial data centers
  • Operators want to troubleshoot latency anomalies
    • End-host latencies can be monitored locally
    • Detection, diagnosis and localization through a network: no native support of latency measurements in a router/switch
prior solutions
Prior solutions
  • Lossy Difference Aggregator (LDA)
    • Kompella et al. [SIGCOMM ’09]
    • Aggregate latency statistics
  • Reference Latency Interpolation (RLI)
    • Lee et al. [SIGCOMM ’10]
    • Per-flow latency measurements

More suitable due to more fine-grained measurements

deployment scenario of rli
Deployment scenario of RLI
  • Upgrading all switches/routers in a data center network
  • Pros
    • Provide finest granularity of latency anomaly localization
  • Cons
    • Significant deployment cost
    • Possible downtime of entire production data centers
  • In this work, we are considering partial deployment of RLI
    • Our approach: RLI across Routers (RLIR)
overview of rli architecture
Overview of RLI architecture
  • Goal
    • Latency statistics on a per-flow basis between interfaces
  • Problem setting
    • No storing timestamp for each packet at ingress and egress due to high storage and communication cost
    • Regular packets do not carry timestamps

Ingress I

Router

Egress E

overview of rli architecture1
Overview of RLI architecture

Linear interpolation

line

    • Premise of RLI: delay locality
  • Approach

1) The injector sends reference packets regularly

2) Reference packet carries ingress timestamp

3) Linear interpolation: compute per-packet latency estimates at the latency estimator

4) Per-flow estimates by aggregating per-packet estimates

L

Reference

Packet

Injector

Ingress I

Egress E

2

1

2

1

Delay

1

L

R

R

2

Interpolated

delay

Latency

Estimator

Time

full vs partial deployment
Full vs. Partial deployment
  • Full deployment: 16 RLI sender-receiver pairs
  • Partial deployment: 4 RLI senders + 2 RLI receivers
  • 81.25 % deployment cost reduction

Switch 1

Switch 3

Switch 5

Switch 2

Switch 4

Switch 6

RLI Sender (Reference Packet Injector)

RLI Receiver (Latency Estimator)

case 1 presence of cross traffic
Case 1: Presence of cross traffic
  • Issue: Inaccurate link utilization estimation at the sender leads to high reference packet injection rate
  • Approach
    • Not actively addressing the issue
    • Evaluation shows no much impact on packet loss rate increase
    • Details in the paper

Switch 1

Switch 3

Switch 5

Switch 2

Switch 4

Switch 6

Link utilization estimation on Switch 1

Cross

Traffic

Bottleneck

Link

RLI Sender (Reference Packet Injector)

RLI Receiver (Latency Estimator)

case 2 rli sender side
Case 2: RLI Sender side
  • Issue: Traffic may take different routes at an intermediate switch
  • Approach: Sender sends reference packets to all receivers

Switch 1

Switch 3

Switch 5

Switch 2

Switch 4

Switch 6

RLI Sender (Reference Packet Injector)

RLI Receiver (Latency Estimator)

case 3 rli receiver side
Case 3: RLI Receiver side
  • Issue: Hard to associate reference packets and regular packets that traversed the same path
  • Approaches
    • Packet marking: requires native support from routers
    • Reverse ECMP computation: ‘reverse’ engineer intermediate routes using ECMP hash function
    • IP prefix matching at limited situation

Switch 1

Switch 3

Switch 5

Switch 2

Switch 4

Switch 6

RLI Sender (Reference Packet Injector)

RLI Receiver (Latency Estimator)

deployment example in fat tree topology
Deployment example in fat-tree topology

RLI Sender (Reference Packet Injector)

Reverse ECMP computation /

IP prefix matching

IP prefix matching

RLI Receiver (Latency Estimator)

evaluation
Evaluation
  • Simulation setup
    • Trace: regular traffic (22.4M pkts) + cross traffic (70M pkts)
    • Simulator
  • Results
    • Accuracy of per-flow latency estimates

10% / 1%

injection rate

RLI

Sender

RLI

Receiver

Regular

Traffic

Reference

packets

Packet

Trace

Traffic

Divider

Switch1

Switch2

Cross Traffic

Injector

Cross

Traffic

accuracy of per flow latency estimates
Accuracy of per-flow latency estimates

Bottleneck link utilization:

93%

67%

10% injection

10% injection

1% injection

1% injection

CDF

Relative error

1.2%

4.5%

18%

31%

summary
Summary
  • Low latency applications in data centers
    • Localization of latency anomaly is important
  • RLI provides flow-level latency statistics, but full deployment (i.e., all routers/switches) cost is expensive
  • Proposed a solution enabling partial deployment of RLI
    • No too much loss in localization granularity (i.e., every other router)