EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud

EyeQ:(An engineer’s approach to)Taming network performance unpredictabilityin the Cloud Vimal Mohammad Alizadeh BalajiPrabhakar David Mazières Changhoon Kim Albert Greenberg

What are we depending on? Many customers don’t even realise network issues: Just “spin up more VMs!” Makes app more network dep. 5 Lessons We’ve Learned Using AWS … in the Netflix data centers, we have a high capacity, super fast, highly reliable network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency. http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html Overhaul appsto deal with variability

Cloud: Warehouse Scale Computer Multi-tenancy: To increase cluster utilisation Provisioning the Warehouse CPU, memory, disk Network http://research.google.com/people/jeff/latency.html

Sharing the Network • Policy • Sharing model • Mechanism • Computing rates • Enforcing rates on entities… • Per-VM (multi-tenant) • Per-service (search, map-reduce, etc.) Can we achieve this? • 2Ghz VCPU • 15GB memory • 1Gbps network Customer X specifies the thickness of each pipe. No traffic matrix. (Hose Model) Tenant X’s Virtual Switch Tenant Y’s Virtual Switch … … VM1 VM2 VM3 VMn VM1 VM2 VM3 VMi

Why is it hard? (1) • Default policy insufficient: 1 vs many TCP flows, UDP, etc. • Poor scalability of traditional QoS mechanisms • Bandwidth demands can be… • Random, bursty • Short: few millisecond requests • Timescales matter! • Need guarantees on the order of few RTTs (ms) 10–100MB 10–100KB

Seconds: Eternity Shared 10G pipe 1 Long lived TCP flow Switch Bursty UDP session ON: 5ms OFF: 15ms

Under the hood Switch

Why is it hard? (2) • Switch sees contention, but lacks VM state • Receiver-host has VM state, but does not see contention (1) Drops in network: servers don’t see true demand (2) Elusive TCP (back-off) makes true demand detection harder Switch

Key Idea: Bandwidth Headroom • Bandwidth guarantees: managing congestion • Congestion: link utilreaches 100% • At millisecond timescales • Don’t allow 100% util • 10% headroom: Early detection at receiver Single Switch: Headroom What about a network? TCP N x 10G Shared pipeLimit to 9G UDP

Network design: the old Over-subscription http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-teaser/

Network design: the new (1) Uniform capacity across racks (2) Over-subscription only at Top-of-Rack http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-teaser/

Mitigating Congestion in a Network VM VM 10Gbps pipe 10Gbps pipe Load balancing: ECMP, etc. Admissibility: e2e congestion control (EyeQ) Server Server Fabric Fabric Aggregate rate < 10Gbps Congestion free Fabric Aggregate rate > 10Gbps Fabric gets congested Load balancing + Admissibility = Hotspot free network core [VL2, FatTree, Hedera, MicroTE]

EyeQ Platform TX RX VM VM VM VM VM untrusted untrusted Software VSwitch Software VSwitch Adaptive Rate Limiters Congestion Detectors DataCentreFabric TX packets RX packets 3Gbps 6Gbps Congestion Feedback End-to-endflow control (VSwitch—VSwitch) RX componentdetects TX componentreacts

Does it work? TCP: 6Gbps UDP: 3Gbps Improves utilisation Provides protection Without EyeQ With EyeQ

State: only at edge One Big Switch EyeQ

Thanks!jvimal@stanford.edu EyeQ Load balancing + Bandwidth headroom + Admissibility at millisec timescales = Network as one big switch = Bandwidth sharing at edge Linux, Windows implementation for 10Gbps ~1700 lines C code http://github.com/jvimal/perfiso_10g (Linux kmod) No documentation, yet. 

EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud