Eyeq an engineer s approac h to taming network performance unpredictability in the cloud
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud. Vimal. Mohammad Alizadeh Balaji Prabhakar David Mazières. Changhoon Kim Albert Greenberg. What are we depending on?. Many customers don’t even realise network issues:

Download Presentation

EyeQ : (An engineer’s approac h to ) Taming network performance unpredictability in the Cloud

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Eyeq an engineer s approac h to taming network performance unpredictability in the cloud

EyeQ:(An engineer’s approach to)Taming network performance unpredictabilityin the Cloud

Vimal

Mohammad Alizadeh

BalajiPrabhakar

David Mazières

Changhoon Kim

Albert Greenberg


What are we depending on

What are we depending on?

Many customers

don’t even realise network issues:

Just “spin up more VMs!”

Makes app more network dep.

5 Lessons We’ve Learned Using AWS

… in the Netflix data centers, we have a high capacity, super fast, highly reliable

network. This has afforded us the luxury of designing around chatty APIs to remote systems. AWS networking has more variable latency.

http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html

Overhaul appsto deal with variability


Cloud warehouse scale computer

Cloud: Warehouse Scale Computer

Multi-tenancy: To increase cluster utilisation

Provisioning the Warehouse

CPU, memory, disk

Network

http://research.google.com/people/jeff/latency.html


Sharing the network

Sharing the Network

  • Policy

    • Sharing model

  • Mechanism

    • Computing rates

    • Enforcing rates on entities…

      • Per-VM (multi-tenant)

      • Per-service (search, map-reduce, etc.)

Can we achieve this?

  • 2Ghz VCPU

  • 15GB memory

  • 1Gbps network

Customer X specifies

the thickness of each pipe.

No traffic matrix.

(Hose Model)

Tenant X’s Virtual Switch

Tenant Y’s Virtual Switch

VM1

VM2

VM3

VMn

VM1

VM2

VM3

VMi


Why is it hard 1

Why is it hard? (1)

  • Default policy insufficient: 1 vs many TCP flows, UDP, etc.

  • Poor scalability of traditional QoS mechanisms

  • Bandwidth demands can be…

    • Random, bursty

    • Short: few millisecond requests

  • Timescales matter!

    • Need guarantees on the order of few RTTs (ms)

10–100MB

10–100KB


Seconds eternity

Seconds: Eternity

Shared

10G pipe

1 Long lived

TCP flow

Switch

Bursty UDP session

ON: 5ms

OFF: 15ms


Unde r the hood

Under the hood

Switch


Why is it hard 2

Why is it hard? (2)

  • Switch sees contention, but lacks VM state

  • Receiver-host has VM state, but does not see contention

(1) Drops in network: servers don’t see true demand

(2) Elusive TCP (back-off) makes true demand detection harder

Switch


Key idea bandwidth headroom

Key Idea: Bandwidth Headroom

  • Bandwidth guarantees: managing congestion

  • Congestion: link utilreaches 100%

    • At millisecond timescales

  • Don’t allow 100% util

    • 10% headroom: Early detection at receiver

Single Switch: Headroom

What about a network?

TCP

N x 10G

Shared pipeLimit to 9G

UDP


Network design the old

Network design: the old

Over-subscription

http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-teaser/


Network design the new

Network design: the new

(1) Uniform capacity across racks

(2) Over-subscription only at

Top-of-Rack

http://bradhedlund.com/2012/04/30/network-that-doesnt-suck-for-cloud-and-big-data-interop-2012-session-teaser/


Mitigating congestion in a network

Mitigating Congestion in a Network

VM

VM

10Gbps pipe

10Gbps pipe

Load balancing: ECMP, etc.

Admissibility: e2e congestion control (EyeQ)

Server

Server

Fabric

Fabric

Aggregate rate < 10Gbps

Congestion free Fabric

Aggregate rate > 10Gbps

Fabric gets congested

Load balancing + Admissibility =

Hotspot free network core

[VL2, FatTree, Hedera, MicroTE]


Eyeq platform

EyeQ Platform

TX

RX

VM

VM

VM

VM

VM

untrusted

untrusted

Software VSwitch

Software VSwitch

Adaptive Rate Limiters

Congestion Detectors

DataCentreFabric

TX packets

RX packets

3Gbps

6Gbps

Congestion Feedback

End-to-endflow control

(VSwitch—VSwitch)

RX componentdetects

TX componentreacts


Does it work

Does it work?

TCP: 6Gbps

UDP: 3Gbps

Improves utilisation

Provides protection

Without EyeQ

With EyeQ


State only at edge

State: only at edge

One Big Switch

EyeQ


Thanks jvimal@stanford edu

[email protected]

EyeQ

Load balancing

+ Bandwidth headroom

+ Admissibility at millisec timescales

= Network as one big switch

= Bandwidth sharing at edge

Linux, Windows implementation for 10Gbps

~1700 lines C code

http://github.com/jvimal/perfiso_10g (Linux kmod)

No documentation, yet. 


  • Login