Towards Predictable Datacenter Networks

Towards Predictable Datacenter Networks Hitesh Ballani, Paolo Costa, Thomas Karagiannis, Ant Rowstron SIGCOMM 2011 Presenter: Lili Sun 2020/1/5

Outline • Motivation and Goals • Virtual Network Abstractions • Oktopus • Evaluation • Conclusion • Discussion Clues

Interface Provider Tenant Storage Resource Computing Resource Cloud Datacenter Production Datacenter Physical Network Virtual Network (VMs) Backgrounds • Datacenter • Cloud datacenter • Production datacenter • Interface • computing resources • storage resources

Motivation and Goals • Motivation: Network performance variability • Cloud datacenter (system load and VM placement) • Production datacenter (variable network bandwidth) • Challenges • application performance unstable • tenant costs unpredictable • provider revenue loss • Goals • Guaranteed application performance • Tenants' cost • Providers' revenue

Virtual Network Abstractions • Virtual network abstractions • Virtual cluster (VC) • Virtual oversubscribed cluster (VOC) • Design goals • Tenant suitability: An intuitive way about network performance • Provider flexibility: multiplex many virtual networks on their physical network

Virtual cluster • Tenant request: <N, B> • All-to-all traffic patterns • Suitable for data-intensive applications

Virtual oversubscribed cluster • Tenant request: <N,B,S,O> • Local communication patterns • Suitable for the apps have special communications patterns.

Oktopus • Support tenants opt for • Virtual cluster • Virtual oversubscribed cluster • No virtual cluster • Two main components • Management plane (request & account for network resources and maintain bandwidth reservations) • Data plane (enforce the bandwidth available) • Network manager • Meet the bandwidth demands • Maximize the number of tenants

Cluster Allocation • A virtual cluster request r : <N,B> • Topology: tree-like physical network • Bandwidth required on link : L 200Mbps 100Mbps 100Mbps 100Mbps 100Mbps 100Mbps 100Mbps

Allocation Algorithm • Allocated VMs to a sub-tree (a machine, a rack, a pod) • Number of empty VM slots in the sub-tree • Residual bandwidth on the physical link • For a machine • For the same level • Choose the sub-tree with the least amount of residual bandwidth • For the different levels • Start from the lowest level • Physical machine < racks <pods (level) • Goals • a greater outbound bandwidth available • allow accommodate more future tenants.

Oversubscribed Cluster Allocation • An oversubscribed cluster request: <N,S,B,O> • The total bandwidth required by group i on link : • The bandwidth to be reserved on link L for request r is the sum across all the groups

Allocation Algorithm • Individual group is similar to a virtual cluster • Reuse the cluster allocation algorithm • Conditional bandwidth needed for jth group of request r on link L : • The bandwidth required by groups [1,…,i] on L: • Allocate VMs to sub-tree v:

VM1 Controller VM VM i EM1 EM EM i Enough BW Minimal rate …… …… Maximal rate Fair BW Enforcing Virtual Network • Rate limiting mechanism • Traditional ways: bandwidth reservation at switches • Oktopus: endhost-based rate enforcement • Design • Enforcement module: measures traffic rate to other VMs • Controller VM: calculates the max-min fair share • Enforcement module: uses per-destination-VM limiter to enforce them • Advantage • Calculating at Controller VM for each tenant reduce the control rate • Enforcement modules enable distributed rate limits • Tenant-specific computation reduces scale of the problem • compute rates for each virtual network (Sends traffic rate) (Per-destination-VM limiter) (Measures traffic rate) (Calculates traffic rate) …… (Max-min fair share) (Returns traffic rate) (Per-destination-VM limiter) (Measures traffic rate)

Enforcing Virtual Network • Tenants without virtual network • Two-level priorities • Traffic from tenants with a virtual network is high level • Other traffic is low level (fair share) • Unused capacity in a VM with a virtual network • Weighted sharing mechanisms • Unused capacity is distributed among all tenants

Design Discussion • NM and Routing • assumes that the datacenter has a simple tree topology • For the topologies with limited path diversity • For the even richer network topologies • Multiple physical links can be treated as a single aggregate link • NM can control datacenter routing to build tenant-specific trees • Failures • For failures of physical links and switches, our allocation algorithms can be extended to determine the tenant VMs that need to be migrated, and reallocated

Evaluation • Simulation setup • Tc : minimum compute time for the job • Tn: the time for last flow to finish • T = max (Tc, Tn): the completion time • Tn < Tc: to minimize the tenants cost • Baseline: the purely VM-based resource allocation • locality-aware allocation algorithm • A flow’s bandwidth is calculated according to max-min fairness • Virtual network request • <N> can be expressed as <N,B> or <N,B,S,O> • Simulation breadth • The entire space for most parameters of interest in today’s datacenters • tenant bandwidth requirements, datacenter load, and physical topology oversubscription

Production Datacenter Experiment • Job completion time

Production Datacenter Experiment • Utilization • the allocation of VMs does not account for network demands

Production Datacenter Experiment • Diverse communication patterns. • each tenant VM requires a different bandwidth

Cloud Datacenter Experiment • Rejected Requests • tenant dynamics with requests arriving over time • admission control scheme

Cloud Datacenter Experiment • Tenant costs and provider revenue • Tenant will be charged based on the time they occupy their VMs

Cloud Datacenter Experiment • Charging for bandwidth • virtual network abstractions allow explicitly charging for network bandwidth • <N,B> for time T, Tenant cost: or

Results and conclusion • Virtual network abstractions • practical, can be efficiently implemented and provide significant benefits • provide a simple way of information exchange between tenants and providers • Tenant • expose network requirement and pick the trade-off between the performance of applications and cost • Provider • account for the network resources and improve their revenue

Failure of tenant VMs For the oversubscribed network cluster, if a tenant VM fails, does the failed VM or all tenant VMs in the intra-group need to be migrated and be reallocated? Because the communication between reallocated VM and other VMs will increases the bandwidth from the underlying physical infrastructure. Description of network bandwidth resources Network security Actual bandwidth requirement How to solve the problem of description of network bandwidth resources? There is no datasets describing job bandwidth requirements. Compare to the physical switch, virtual switch has a weaker monitoring capability, so how to ensure the network security? For many tenant, they don't know how much bandwidth they need exactly for all kinds of applications, so how to deal with this problem? Different from the computing and storage resources, the use of bandwidth for one tenant will impact other tenants because of the limited total bandwidth resources. So besides the pricing model, how to make sure that the tenant’s bandwidth requirement is appropriate (not too much or too little) (for example the monitor system to provide the actual demands to tenants) Discussion clues • Actual bandwidth requirement • Failure of tenant VMs • Description of network bandwidthresources • Network security

Thank you!

Towards Predictable Datacenter Networks