1 / 35

Storage Quality of Service for Enterprise Workloads

Storage Quality of Service for Enterprise Workloads. Tom Talpey Eno Thereska Microsoft. Outline. Problem Statement Classification, Control and Costing Controller and Policies Demo Details Q&A. Background: Enterprise data centers. General purpose applications

Download Presentation

Storage Quality of Service for Enterprise Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage Quality of Service for Enterprise Workloads Tom Talpey Eno Thereska Microsoft

  2. Outline Problem Statement Classification, Control and Costing Controller and Policies Demo Details Q&A

  3. Background: Enterprise data centers • General purpose applications • Application runs on several VMs • E.g. a multi-tier service • Separate networks for VM-to-VM • traffic and VM-to-Storage traffic • Storage is virtualized • Resources are shared • Filesharing protocols deployed VM VM VM VM VM VM VirtualMachine VirtualMachine vDisk Hypervisor vDisk NIC S-NIC NIC S-NIC Switch Switch Switch S-NIC S-NIC Storage server Storage server 2

  4. The Storage problem • First things first • In a datacenter, storage is a shared resource • Many tenants (customers) • Contention between tenants in the datacenter • Many workloads • Many backend device types and throughputs • Network resources are also shared • Provisioning key to providing cost-effective SLAs

  5. More storage problems • Storage throughput (capacity) changes with time • Storage demand also changes with time • SLAs do not • Variety of storage protocols • SMB3, NFS, HTTP, …

  6. SMB3 challenges • Multichannel – Many to Many • Many SMB3 sessions (flows) on one channel • Many SMB3 channels used for each session • SMB Direct • RDMA offload of bulk transfer – stack bypass • Live Migration • Memory-to-memory with time requirements • Starvation of other flows must be avoided

  7. Service Level Agreement (SLA) Examples • Minimum guaranteed (“Not less than…”) • Storage bandwidth • Storage I/Os per second • Maximum allowed (“Not more than…”) • Opportunistic use (“If available…”) • Bandwidth/IOPS can use additional resources when available (and authorized) • All limits apply across each tenant’s/customer’s traffic

  8. Storage SLA Challenges • Diverse operations mix • E.g. SMB3 Create, Read/Write, metadata • Resources, latencies, IO size diversity • Dynamic, diverse workload • E.g. database tier vs web frontend vs E-mail • Small-random, large-sequential, metadata • Bi-directional • Read vs write differing costs, resources

  9. Can’t We Just Use Network Rate Limits? • No. Consider: • The network flow is a 4-tuple which names only source and destination • Does not distinguish: • Read from write • Storage resources such as disk, share, user, etc • Even deep packet inspection doesn't help • Can't go that deep – not all packets contain the full context • Might be encrypted • SMB multichannel and connection sharing • Many-to-many – mixing of adapters, users, sessions, etc • RDMA • Control and data separated, and offloaded

  10. Classification – Storage “Flow” • Classify storage traffic into Flow buckets • Pre-defined “tuple” of higher-layer attributes • Can have any number of elements, and wildcards • Can therefore apply across many connections, users, etc. • For example (SMB3 and virtual machine): • VM identity • \\Servername\Sharename • Filename • Operation type • Identify each operation as member of Flow • Apply policy to Flow

  11. Elements of an IO classification Share appears as a block device Z: (\\share\dataset) VHD file maps to drive H:\RM79.vhd Stage configuration: “All IOs from VM4_sid to //serverX/AB79.vhd should be rate limited to B” Block device maps to a VHD \\serverX\RM79.vhd Maps to device \\device\ssd5 Maps to packets Dst IP: ServerX’s IP Dst port: 445 Simple classifier tuple: { VM4 , \\share\dataset }

  12. Control (Rate Limiting) • Apply limits (max) / guarantees (min) to each flow • Rate-control operations to meet these • Control at: • Initiator (client) • Target (server) • Any point in storage stack

  13. Rate Limiting along the Flow Storage points of control Network points of control Stage configuration: “All IOs from VM4_sid to //serverX/AB79.vhd should be rate limited to B” Simple classifier tuple: { VM4 , \\share\dataset }

  14. Control based on simple Classification? • Again, no • Taking the example SLA • <VM 4, \\share\dataset> --> Bandwidth B • With contention from other VMs • Comparing three strategies • By “payload” – actual request size • By “bytes” – read and write size • By “IOPS” – operation rate

  15. Classification-only Rate Limit Fairness? VM 4 VM 4 VM 4 VM 2 VM 3 VM 1 64KB Reads 8KB Writes 8KB Writes 8KB Writes 8KB Reads 8KB Reads Storage server Storage server Storage server By IOPS - large beats small by dominating bandwidth By payloads- reads beat writes by dominating queue By bytes – writes beat reads by dominating (e.g. SSD) expense

  16. Solution - Cost • Compute a cost to each operation within a flow, and to a flow itself • “Controller” function in the network constructs empirical cost models based on: • Device type (e.g. RAM, SSD, HDD) • Server property • Workload characteristics (r/w ratio, size) • “Normalization” for large operations • Client property • Bandwidths and latencies • Network and storage property • Any other relevant attribute • Cost model is assigned to each rate limiter • Cost is managed globally across flows by distributed rate limiters

  17. Cost of operations along the Flow Cost functions Stage configuration: “All IOs from VM4_sid to //serverX/AB79.vhd should be rate limited to B” $Client $Server $$SSD Simple classifier tuple: { VM4 , \\share\dataset } $Net

  18. Outline Problem Statement Classification, Control and Costing Controller and Policies Demo Details Q&A

  19. Demo of storage QoSprototype • Four tenants: Red, Blue, Yellow, Green • Each tenant • Rents 30 virtual machines on 10 servers that access shared storage • Pays the same amount • Should achieve the same IO performance • Tenants have different workloads • Red tenant is aggressive: generates more requests/second

  20. Demo setup Mellanox ConnectX-3 40Ge RNIC (RoCE) 120 one-Core VMs across 10 Hyper-V servers - Run IoMeter 64K Read from 3GB VHD

  21. Things to look for • Enforcing storage QoS across 4 competing tenants • Aggressive tenant under control • Inter-tenant work conservation • Bandwidth released by idle tenant given to active tenants • Intra-tenant work conservation • Bandwidth of tenant’s idle VMs given to its active VMs

  22. Demo: Red tenant dominates!  By being aggressive, red tenant gets better performance MegaBytes/second Time (seconds) Tenant performance is not proportional to its payment

  23. Research Prototype: With Storage QoS Tenants get equal performance Algorithm notices red tenant’s performance MegaBytes/second Time (seconds) Tenant performance is proportional to its payment

  24. Demo

  25. Behind the scenes: how did it work? Q Q Q Programmable queues along the IO path High-level SLA Q Q Q Q Q Q Q Q Control API Controller Data-plane Queues + Centralized controller

  26. 1. Classification [IO Header -> Queue] 2. Queue servicing [Queue -> <rate, priority, size>] Data plane queues

  27. Controller: Distributed, dynamic enforcement VM VM VM VM VM VM VM VM Hypervisor Hypervisor • SLA needs per-VM enforcement • Need to control the aggregate rate of VMs 1-4 that reside on different physical machines • Static partitioning of bandwidth is sub-optimal Storage server 40Gbps <{Red VMs 1-4}, *, * //share/dataset> --> Bandwidth 40 Gbps

  28. Work-conserving solution VM VM VM VM VM VM VM VM Hypervisor Hypervisor Storage server • VMs with traffic demand should be able to send it as long as the aggregate rate does not exceed 40 Gbps • Solution: Max-min fair sharing

  29. Max-min fair sharing (intuition) No VM is given more than its demand Total allocated rate doesn’t exceed B • Well-accepted notion of allocating a shared resource • Basic idea: Maximize the minimum allocation • Formally, VMs [1..n] sharing rate B • Demand Di for VMi • Max-min fair share f ensures that • if the rate allocated to VM iisRi= min(f, Di) • then

  30. Distributed rate limiting algorithm 3.5 VM4 Unallocated Rate = 7 Unallocated VMs = 2 Current fair share = 7/2 = 3.5 Unallocated Rate = 10 Unallocated VMs = 4 Current fair share = 10/4 = 2.5 6 3.5 VM3 5 Algorithm achieves a max-min fair allocation 2 VM2 2.5 2 3.5 1 1 VM1 VM Demand VM1 VM2 VM3 VM4 Aggregate rate = 10 • Allocate rate to VMs based on their demand • No VM get a rate higher than its demand • VMs with unmet demand get the same rate

  31. Max-min fair sharing (recap) • Well studied problem in networks • Existing solutions are distributed • Each VM varies its rate based on congestion • Converge to max-min sharing • Drawbacks: complex and requires congestion signal • But we have a centralized controller • Converts to simple algorithm at controller

  32. Controller decides where to enforce VM VM VM VM VM VM VM VM Minimize # times IO is queued and distribute rate limiting load Hypervisor Hypervisor SLA constraints • Queues where resources shared • Bandwidth enforced close to source • Priority enforced end-to-end Efficiency considerations • Overhead in data plane ~ # queues • Important at 40+ Gbps Storage server

  33. Centralized vs. decentralized control Centralized controller allows for simple algorithms that focus on SLA enforcement and not on distributed system challenges Analogous to benefits of centralized control in software-defined networking (SDN)

  34. Summary and takeaways • Storage QoS building blocks • Traffic classification • Rate limiters • Logically centralized controller

  35. References / Other Work • Predictable Data Centers • http://research.microsoft.com/datacenters/ • Storage QoS - IOFlow (SOSP 2013) • http://research.microsoft.com/apps/pubs/default.aspx?id=198941 • Network + storage QoS (OSDI 2014) • http://research.microsoft.com/apps/pubs/default.aspx?id=228983 • SMB3 Rate Limiter (Jose Barreto’s blog) • http://smb3.info • http://blogs.technet.com/b/josebda/archive/2014/08/11/smb3-powershell-changes-in-windows-server-2012-r2-smb-bandwidth-limits.aspx

More Related