1 / 32

QoS , QoS Baby

QoS , QoS Baby. OpenStack Barcelona 2016. Speakers. Anne McCormick. Robert Starmer. Alka Sathnur. Software Engineer, Cisco @amccormi4. CTO & Principal Kumulus Technologies @rstarmer. Ops QA, Cisco @alkasat12. Topics. Traditional QoS Concepts Current/Future OpenStack QoS

cvaughn
Download Presentation

QoS , QoS Baby

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QoS, QoS Baby OpenStack Barcelona 2016

  2. Speakers Anne McCormick Robert Starmer Alka Sathnur Software Engineer, Cisco @amccormi4 CTO & Principal Kumulus Technologies @rstarmer Ops QA, Cisco @alkasat12

  3. Topics Traditional QoS Concepts Current/Future OpenStack QoS Beyond the Network Compute/Storage Bottlenecks and Differentiation Use Case Q/A

  4. Traditional QoS Concepts

  5. Quality of service (QoS) is the overall performance of a telephony or computer network, particularly the performance seen by the users of the network. - Wikipedia

  6. What is Quality of Service • Network centric view of resource • Availability • Reliability • Provides a model for understanding and manipulating network impact on service delivery • Jitter/latency/loss all are important aspects of a communications channel

  7. “QoS” in OpenStack

  8. QoS in the “physical” Network • Initial QoS was managed by Routers • Commited Information Rate • Routers matched bandwidth between different networks • Handling contention led to QoS Policies or Classes • Priority • Multi-queue • And multiple models of handling those queues • FIFO • WRED

  9. QoS in Layer 2 Networks • L2 networks tend to try to avoid ever storing packets • Less chance to manage different flows of traffic • But L2 networks really aren’t L2 any more • So we can classify traiffic and if necessary queue it • Really helps when you have multiple types of traffic like storage and voice or video on the same network

  10. Current/Future OpenStack QoS

  11. QoS in the early days • RXTX Factor • Nova network based “Sharing” algorithm • Based on nova flavor metadata • Neutron Mitaka • ML 2 Extension • SR-IOV, OVS, Linux Bridge “bandwidth” limiations (e.g. rx/tx factor) • Neutron Newton • As with Mitaka • Adds DSCP marking < This is a big deal

  12. Rate Limiting • Seems like a straight forward approach: • Like non-oversubscribed processors • Sharing fixed IOPs limits on a storage array • Rate limiting flows or specific services can have unintended consequences: • Dramatic impact to “good put” vs. “through put” • Particularly bursty applications can become unstable

  13. DSCP Marking • Let’s help the network out • Mark packets so that the network infrastructure has some better information to go on • Execute marking via application/OS level (VM • or • Execute marking via the switch input • Not a panacea • May still have “good put” impact • At least provides a better interaction for determining who gets access to the available bandwidth resources

  14. Beyond the NetworkSo you got the traffic there faster… now what? Compute and storage bottlenecks!

  15. Compute Bottlenecks … and how to alleviate them

  16. Compute1 Compute2 ComputeN Controller nova-scheduler

  17. VeryImportantTM VM VeryImportantTM VM CPU Hog Controller nova-scheduler Node1 Node1

  18. VeryImportantTM VM VeryImportantTM VM Controller nova-scheduler Node1 Node1

  19. Cost of CPU Sharing/Context Switching Ran a simple OpenStack multicast iperf test: Network highly optimized for multicast (SR-IOV port, multiple rx queues with maximum queue size, RSS, ARFS, QoS) iPerf receiver on tenant VM, receiving steady 800 Mbits/sec multicast stream When context switching, receiver experienced up to 0.2% packet loss, particularly when switching across NUMA nodes (as opposed to switching within same node)

  20. Compute Resource Differentiation/Prioritization Host aggregates – define separate groups of compute hosts Flavors – define hardware needs such as number of cores, CPU capabilities/limits, affinity/anti-affinity, etc., via host filters CPU pinning/NUMA awareness – pin VMs to dedicated cores to prevent context switches across NUMA nodes

  21. Storage Bottlenecks … and how to alleviate them

  22. VeryImportantTM VM Compute Storage1 I/O Hog Compute Compute I/O Traffic Storage2 Compute I/O Traffic StorageN

  23. Cost of Storage Contention Ran a simple OpenStack read/write I/O test: Two VMs running on same host, different volumes 3 Ceph nodes, active/active/active When reading simultaneously, both VMs experienced 80 MB/s drop in read rate When writing simultaneously, both experienced 100 MB/s drop in write rate

  24. Storage Resource Differentiation/Prioritization Host aggregates – define separate groups/clusters of storage servers Flavors – define I/O bandwidth limits for VMs (outbound traffic) Differentiate at storage backend Cinder has QoS specs, volume types, priority (more IOPS to particular volumes) Ceph has storage types and the ability to limit IOPS if needed AFAIK, Swift does not have the ability to differentiate/prioritize storage resources at the backend

  25. Conclusion Network QoS is only a partial solution In order to guarantee resources for mission-critical applications and data, a solution across all cloud resources (network, compute, storage) must be used It is complicated to get this right across all resources, but it can be done

  26. Use Case

  27. Real World Use Case Bringing an existing Content Delivery Network that is comprised of bare metal cache nodes onto an Openstack platform

  28. A Delivery Service is a software structure in OMD that maps an origin source to Traffic Servers by the Fully Qualified Domain Name (FQDN). FQDN is in the Request URI from the client media player. Cache groups can belong to a single or multiple Delivery services Cache Group is a logical grouping for HA. Each cache is typically located in different location to provide site-level redundancy. Each cache in cache group associated has single geo coordinates. Content Delivery Network Origin Server Origin Server Origin Server Enables: • Multiple Content Sources • Per Content Source content cache/storage • Intelligent load balancing Orchestration Traffic Server Mid-tier cache Traffic Server Traffic Server Control Server Traffic Server CDN Monitor CDN Analytics Traffic Server Traffic Server Traffic Server Traffic Server Traffic server (Edge) Traffic Server Traffic Server Traffic Server Traffic Server (Edge) Edge cache Groups

  29. A Delivery Service is a software structure in OMD that maps an origin source to Traffic Servers by the Fully Qualified Domain Name (FQDN). FQDN is in the Request URI from the client media player. Cache groups can belong to a single or multiple Delivery services Cache Group is a logical grouping for HA. Each cache is typically located in different location to provide site-level redundancy. Each cache in cache group associated has single geo coordinates. Content Delivery Network Origin Server Origin Server Origin Server Orchestration Director Mid-tier cache Traffic Server Traffic Server CDN Analytics Traffic Server Traffic Server Traffic Server Traffic Server Traffic Server Control Server Traffic Server CDN Monitor Traffic Server Traffic Server Traffic server (Edge) Traffic Server Traffic Server Traffic Server Traffic server (Edge) Edge Cache Groups and Storage Clusters

  30. Use Case Summary Dynamically expanding a Content Delivery Network is possible, provided the Orchestrator ensures that network, compute and storage give top priority to the application traffic.

  31. ¿Preguntas?

  32. ¡Gracias!

More Related