eyeq an engineer s approac h to taming network performance unpredictability in the cloud
Download
Skip this Video
Download Presentation
EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud

Loading in 2 Seconds...

play fullscreen
1 / 17

EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud. Vimal. Mohammad Alizadeh Balaji Prabhakar David Mazières. Changhoon Kim Albert Greenberg. What are we depending on?. Many customers don’t even realise network issues:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud' - zytka


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
eyeq an engineer s approac h to taming network performance unpredictability in the cloud

EyeQ:(An engineer’s approach to)Taming network performance unpredictabilityin the Cloud

Vimal

Mohammad Alizadeh

BalajiPrabhakar

David Mazières

Changhoon Kim

Albert Greenberg

what are we depending on
What are we depending on?

Many customers

don’t even realise network issues:

Just “spin up more VMs!”

Makes app more network dep.

5 Lessons We’ve Learned Using AWS

… in the Netflix data centers, we have a high capacity, super fast, highly reliable

network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency.

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

Overhaul appsto deal with variability

cloud warehouse scale computer
Cloud: Warehouse Scale Computer

Multi-tenancy: To increase cluster utilisation

Provisioning the Warehouse

CPU, memory, disk

Network

http://research.google.com/people/jeff/latency.html

sharing the network
Sharing the Network
  • Policy
    • Sharing model
  • Mechanism
    • Computing rates
    • Enforcing rates on entities…
      • Per-VM (multi-tenant)
      • Per-service (search, map-reduce, etc.)

Can we achieve this?

  • 2Ghz VCPU
  • 15GB memory
  • 1Gbps network

Customer X specifies

the thickness of each pipe.

No traffic matrix.

(Hose Model)

Tenant X’s Virtual Switch

Tenant Y’s Virtual Switch

VM1

VM2

VM3

VMn

VM1

VM2

VM3

VMi

why is it hard 1
Why is it hard? (1)
  • Default policy insufficient: 1 vs many TCP flows, UDP, etc.
  • Poor scalability of traditional QoS mechanisms
  • Bandwidth demands can be…
    • Random, bursty
    • Short: few millisecond requests
  • Timescales matter!
    • Need guarantees on the order of few RTTs (ms)

10–100MB

10–100KB

seconds eternity
Seconds: Eternity

Shared

10G pipe

1 Long lived

TCP flow

Switch

Bursty UDP session

ON: 5ms

OFF: 15ms

why is it hard 2
Why is it hard? (2)
  • Switch sees contention, but lacks VM state
  • Receiver-host has VM state, but does not see contention

(1) Drops in network: servers don’t see true demand

(2) Elusive TCP (back-off) makes true demand detection harder

Switch

key idea bandwidth headroom
Key Idea: Bandwidth Headroom
  • Bandwidth guarantees: managing congestion
  • Congestion: link utilreaches 100%
    • At millisecond timescales
  • Don’t allow 100% util
    • 10% headroom: Early detection at receiver

Single Switch: Headroom

What about a network?

TCP

N x 10G

Shared pipeLimit to 9G

UDP

network design the old
Network design: the old

Over-subscription

http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-teaser/

network design the new
Network design: the new

(1) Uniform capacity across racks

(2) Over-subscription only at

Top-of-Rack

http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-teaser/

mitigating congestion in a network
Mitigating Congestion in a Network

VM

VM

10Gbps pipe

10Gbps pipe

Load balancing: ECMP, etc.

Admissibility: e2e congestion control (EyeQ)

Server

Server

Fabric

Fabric

Aggregate rate < 10Gbps

Congestion free Fabric

Aggregate rate > 10Gbps

Fabric gets congested

Load balancing + Admissibility =

Hotspot free network core

[VL2, FatTree, Hedera, MicroTE]

eyeq platform
EyeQ Platform

TX

RX

VM

VM

VM

VM

VM

untrusted

untrusted

Software VSwitch

Software VSwitch

Adaptive Rate Limiters

Congestion Detectors

DataCentreFabric

TX packets

RX packets

3Gbps

6Gbps

Congestion Feedback

End-to-endflow control

(VSwitch—VSwitch)

RX componentdetects

TX componentreacts

does it work
Does it work?

TCP: 6Gbps

UDP: 3Gbps

Improves utilisation

Provides protection

Without EyeQ

With EyeQ

state only at edge
State: only at edge

One Big Switch

EyeQ

thanks jvimal@stanford edu
[email protected]

EyeQ

Load balancing

+ Bandwidth headroom

+ Admissibility at millisec timescales

= Network as one big switch

= Bandwidth sharing at edge

Linux, Windows implementation for 10Gbps

~1700 lines C code

http://github.com/jvimal/perfiso_10g (Linux kmod)

No documentation, yet. 

ad