1 / 18

Every Microsecond Counts: Tracking Fine-Grain Latencies with a Lossy Difference Aggregator

Every Microsecond Counts: Tracking Fine-Grain Latencies with a Lossy Difference Aggregator. Ramana Rao Kompella Kirill Levchenko , Alex C. Snoeren , George Varghese. Low latency networks. Networks with end-to-end microsecond latency guarantees important for many applications

redell
Download Presentation

Every Microsecond Counts: Tracking Fine-Grain Latencies with a Lossy Difference Aggregator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Every Microsecond Counts:Tracking Fine-Grain Latencies with a Lossy Difference Aggregator Ramana Rao Kompella KirillLevchenko, Alex C. Snoeren, George Varghese

  2. Low latency networks • Networks with end-to-end microsecond latency guarantees important for many applications • Automated Trading – lose arbitrage opportunities • High Performance Computing – lose parallelism • Cluster Computing, Storage – lose performance SIGCOMM 2009

  3. Obtaining fine-grain measurements “When considering how to reduce latency, the first step is to measure it.” --Joanne Kinsella, Head of Portfolio, British Telecom SIGCOMM 2009

  4. Obtaining fine-grain measurements How to obtain fine-grain measurements using simple low-cost hardware primitives ? • Native Router Support: SNMP, NetFlow • Coarse counters, per-flow statistics • No latency measurements • Active probes • Measuring microseconds requires too many probes • Wastes bandwidth and interferes with regular traffic • State-of-the-art: • Expensive high-fidelity measurement boxes • London Stock Exchange uses Corvil boxes SIGCOMM 2009

  5. Lossy Difference Aggregator (LDA) Computes loss rate, average delay and variance, loss distribution Uses only small amount of hardware (registers and hashing) Measures real traffic with no injected probes SIGCOMM 2009

  6. Measurement model Sender S Receiver R Router SIGCOMM 2009

  7. Measurementmodel Sender S Receiver R DS DR State • Packets always travel from S to R • R to S is considered separately • Divide time into equal bins (measurement intervals) • Interval depends on granularity required (typically sub-second) • Both S and R maintain some state D about packets • State is updated upon packet departure • S transmits DS to R • R computes the required measurement as f(DS, DR) SIGCOMM 2009

  8. Computing loss X Sender S Receiver R 1 2 1 3 2 Loss = 3 – 2 = 1 • Trivially, we can obtain loss rates, however small • Store a packet counter at S and R. • S sends the counter value to R periodically SIGCOMM 2009

  9. Computing delay (naïve) Sender S Receiver R 10 23 13 − = 12 26 14 15 35 20 Avg. Delay = 47/3 = 15.7 Observation: computing delay is trivial if no loss. • A naïve first cut: timestamps • Store timestamps for each packet at S and R • After every cycle, S sends the packet timestamps toR • R computes individual delays, and computes average • Problem: High communication costs • 5 million packet timestamps require ~ 25,000 packets • Sampling reduces communication, but also reduces accuracy SIGCOMM 2009

  10. Estimating delay under no loss Sender S Receiver R 10+12 10+12+15 23+26+35 23+26 10 23 Avg. delay = 84-37/3 = 15.7 Works great, if packets were never lost… • Observation: Aggregation can reduce cost • Store sum of the timestamps at S & R in timestamp accumulator • After every cycle, S sends its sum CS to R • R computes average delay as (CS – CR) / N • Only one counter and one packet to send SIGCOMM 2009

  11. Delay in the presence of loss Unusable since no. of packets does not match X Sender S Receiver R LDA stores the synopsis 10 + 15 23 + 29 Hash Hash + 17 39 12 2 2 2 25 52 27 = − 2 0 1 39 0 29 Avg. delay = 27/2 = 13.5 • (Much) better idea:Lossy Difference Aggregator (LDA) • Hash table with packet count and timestamp sum • Spread loss across several buckets • Consistent hashing to ensure pkts hash to same buckets • Discard buckets with lost packets SIGCOMM 2009

  12. Tuning LDA • Problem 1: High loss implies many bad buckets • Solution: Sampling • Control sampling rate such that not too many buckets corrupted • For a given loss, we can analytically derive an optimal sampling probability that maximizes number of delay samples • Problem 2: Loss rate is unpredictable • Solution: Multiple banks tuned to different loss rates • Logarithmic copies suffice in theory • Smaller in practice (two-bank LDA sufficient in our evaluation) SIGCOMM 2009

  13. Comparison with active probes Relative Error (log scale) Loss Rate SIGCOMM 2009

  14. Relative error with known loss rate Relative Error Loss Rate SIGCOMM 2009

  15. Multiple bank LDA Relative Error Loss Rate SIGCOMM 2009

  16. Computing variance using LDA • Aggregation idea for average delay does not work here directly • Idea: Keep a “plus-minus” counter • Easy implementation based on packet hash [AMS96] • Each TS is added (subtracted) with probability ½ • Cross products cancel since +ive, –ive terms same probability • Results: Average std. deviation around 5% SIGCOMM 2009

  17. Summary • Low latency networks for automated trading, data centers • Active probes require extremely high frequency probes • Specialized boxes too expensive • Simple to implement in modern routers • LDA requires 0.13 mm2 (<1% of 65nm ASIC) • Uses only counters plus increment/decrement operations • Hash function implementation using XOR arrays • Exploits FIFO ordering and fine-grain time synchronization • In comparison with active probes • 25-500x lower relative error, for a fixed communication budget • 50-60x lower overhead, for a target error rate • Future Work: Scalable per-flow latency measurements SIGCOMM 2009

  18. Thanks! Questions ? SIGCOMM 2009

More Related