1 / 47

Singapore, Q1 2013

Resource Management in the Virtual World. Singapore, Q1 2013. Topic. How Resource Management works in vSphere 5 Server Pool Storage Pool Network Pool Architecting Pools of resources in large environment Server Pool Storage Pool Monitoring Pools of resources in large environment

lassie
Download Presentation

Singapore, Q1 2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Resource Management in the Virtual World Singapore, Q1 2013

  2. Topic • How Resource Management works in vSphere 5 • Server Pool • Storage Pool • Network Pool • Architecting Pools of resources in large environment • Server Pool • Storage Pool • Monitoring Pools of resources in large environment • Performance monitoring • Compliance monitoring

  3. Resource Pool: CPU and RAM The “Resource Pool” that most of us know.

  4. Server Resource Pool: Quick Intro

  5. Server Resource Pool: Quick Intro

  6. Server Resource Pool • Cluster means you no longer need to think of individual ESXi host • No longer need to map 1000 VM to 100 ESX • What it is • Grouping of ESX CPU/RAM in a cluster, as if they are 1 giant computer. • They are not, obviously, as a VM can’t span across 2 hosts at a given time. • A few apps might be ESXi aware, and do their own co-ordination. Example is vFabric EM4J (Elastic Memory for Java). But this is a separate topic altogether  • A logical grouping of CPU and RAM only • No Disk and Network • Cluster must be DRS-enabled to create resource pools • What it is not • A way to organise VM. Use folder for this. • A way to segregate admin access for VM. Use folder for this. Example: a cluster has 8 ESX host. Each has 2 cores. So total is 48 GHz

  7. Child Resource Pools • A slice of the parent RP • Child RP can exceed the capacity of the root resource pool • Used to allocate capacity to different consumers and to enable delegated administration

  8. RP Settings • Can control CPU and RAMonly • Diskis done at per VM level. • Network is done at per vDS port group level. • Shares is mandatory • Can’t set it to blank • Shares is always relative • Relative to other VM in same Resource Pool or Cluster • Reservation • Impact the cluster Slot Size. Use sparingly. • Can’t overcommit. Notice the triangle • Take note of “MHz” • Not aware of CPU generation • 2 GHz Xeon 5600 is considered as same speed as2 GHz Xeon 5100. • No such thing as “unlimited” in Limit • A VM can’t go beyond its Configured value. • A VM with 2 GB RAM won’t run as if it has 128 GB (assume ESXi has 128 GB)

  9. Configuration, Reservation, Limit Configured • “Configured” = amount configured for the VM • The amount presented to BIOS of the VM. • Hence a VM will never exceed its configured amount as it can’t see beyond it. ESX RAM is irrelevant. • A Windows VM configured with 8 GB. Windows will start swaping to its own swap file in its NTFS drive if it reach 8 GB. • Limit • A virtual property. Does not exist in physical server. • Not visible by VM. • Can be used to force slow down a VM. ESXi does not clock down the CPU. It just give the VM less CPU cycle. • Reservation • Define the minimum amount of a resource that a consumer is guaranteed to receive – if asked for • Reserved capacity that is not used is available to other consumers for them to use – but not reserve • If a consumer asks for reserved capacity that has been “loaned” to another consumer, it is reclaimed and given to satisfy the reservation

  10. VM-level Reservation • CPU reservation: • Guarantees a certain level of resources to a VM • Influences the admission control (PowerOn) • CPU reservation isn’t as bad as often referenced: • CPU reservation doesn’t claim the CPU when VM is idle (is refundable) • CPU reservation caveats: CPU reservation does not always equal priority • VM uses processors and “Reserved VM” is claiming those CPUs = ResVM has to wait until threads / tasks are finished • Active threads can’t be “de-schedules” if you do so = Blue Screen / Kernel Panic • Memory reservation • Guarantees a certain level of resources to a VM • Influences the admission control (PowerOn) • Memory reservation is as bad as often referenced. “Non-Refundable” once allocated. • Windows is zeroing out every bit of memory during startup… • Memory reservation caveats: • Will drop the consolidation ratio • May waste resources (idle memory cant’ be reclaimed) • Introduces higher complexity (capacity planning)

  11. Resource Pool shares is not “cascaded” down to each VM. • The more VM you put into a Resource Pool, the less each get. • The pool is not per VM. It is for the entire pool. • The only way to give the VM guarantee is to set the pool for each VM. This has admin overhead as it’s not easily visible. VM3 VM4 VM2 VM5 VM6 VM1

  12. Resource Pool: A common mistake… • Sys Admin created 3 resource pool called Tier 1, Tier 2, Tier 3. • The follow the relative High, Normal, Low share. • So Tier 1 gets 4x the shares of Tier 3. • Place 10 VM on each Tier. • 30 total in the cluster. • Everything is fine for now. • Tier 1 does get 4x the share. • Since Tier 1 performs better, place 10 more VM on Tier 1. • So Tier 1 now has 20 VM • Result: Tier 1 performance drops. • The 20 VM are fighting the same share. The above problem will only happens if there is contention. If the physical ESXi host has enough resource to satisfy all 40 VMs, then Shares do not kick in.

  13. Implication of poorly design resource pool The cluster has 2 resource pools and a few VM outside these 2 resource pools. “Test 1” resource pool is given 4x the shares. But it has 8 VM. So 26% / 8 = ~3% per VM.

  14. Per VM settings Screen is based on Sphere 5 and VM hardware version 8

  15. Shares Value and Shares Shares can be “Normal” but the value can differ from VM to VM. Use script to set all the values to identical amount.

  16. Example VM 1 VM 2 VM 3 6 GB pRAM ESXi Hypervisor Total for 3 VM = 10 GB. But ESX only has 6 GB. VM 3 will get 2 GB, as it has reservation. ESX has 4 GB left. VM 1 will get 3000/4000 shares, which is 3/4 * 4 GB = 3 GB VM 2 will get 1000/4000, which is 1/4 * 4 GB = 1 GB. VM 2 performance drops. VM 3 performance not affected at all

  17. Resource Pool: Best Practices • For Tier 1 cluster, where all the VMs are critical to business • Architect for Availability first, Performance second. • Translation: Do not over-commit. • So resource pool, reservation, etc are immaterial as there is enough for everyone. • But size each VM accordingly. No oversizing as it might slow down. • For Tier 3 cluster, use carefully, or don’t use at all. • Tier 3 = overcommit. • So use Reservation sparingly, even at VM level. • This guarantees resource, so it impacts the cluster slot size. • Naturally, you can’t boot additional VM if your guarantee is fully used • Take note of extra complexity in performance troubleshooting. • Use as a mechanism to reserve at “group of VMs” level. • If Department A pays for half the cluster, then creating an RP with 50% of cluster resource will guarantee them the resource, in the event of contention. They can then put as many VM as they need. • But as a result, you cannot overcommit at cluster level, as you have guaranteed at RP level. • Do not configure high CPU or RAM, then use Limit • E.g. configure with 4 vCPU, then use limit to make it “2” vCPU • It can result in unpredictable performance as Guest OS does not know. • High CPU or high RAM has higher overhead. • Limit is used when you need to force slow down a VM. Using Shares won’t achieve the same result • Don’t put VM and RP as “sibling” or same level

  18. Resource Pool: Disk and Network The “Resource Pool” that most of us don’t give enough attention.

  19. Disk is set at individual VM, not Resource Pool Default Shares Value is 1000. This is at Datastore level, which may span across cluster. You can set Limit, but not Reservation. NFS Datastore can even span across vCenter (use case: read-only templates and ISO images)

  20. Reviewing Disk Resource Pool Shares is at Datastore level. Just like “Server” Resource Pool, the more VM you put, the less each VM. You can view at Cluster level (which give view across datastores from this single cluster). This does not tell the whole picture as the datastores may span across clusters. You cannot view at individual ESXi level if it is part of a cluster

  21. Viewing at Datastore level Shares is at Datastore level. Just like “Server” Resource Pool, the more VM you put, the less each VM. You can view at Cluster level (which give view across datastores from this single cluster). This does not tell the whole picture as the datastores may span across clusters. Do no span a datastore across “data center” as you can only see 1 DC at a time. You cannot view at individual ESXi level if it is part of a cluster.

  22. Pre-requisite: Storage IO Control As a Datastore is just a logical construct, it has no physical limit by itself. The limit is on underlying LUN or path. To enable sharing, enable Storage I/O Control

  23. Enabling Storage I/O Control Not enabled by default

  24. Storage DRS • Finally, a “cluster” for storage • Differences • VM disks won’t move to another DS in the event of datastore or LUN failure • Has concept of storage tiering. • Similarity • No need to specify individual datastore • Affinity and Anti-Affinity rules • Load balance among datastores, although in hours/days and not 5 minutes. • New feature in vSphere 5 • More details here.

  25. Network Resource Pool Tenant 2 VMs Tenant 1 VMs VR vMotion FT Mgmt NFS iSCSI Server Admin vSphere Distributed Portgroup Teaming Policy vSphere Distributed Switch Load Based Teaming Shaper Scheduler Scheduler Limit enforcement per team Shares enforcement per uplink Confidential

  26. Network Resource Pool

  27. Network Resource Pool • New feature in vSphere 5. • Can set shares and Limit, but not Reservation. • Unlike CPU/RAM, there is no reservation for Disk and Network • Network & Disk is not something that is completely controlled by ESX. • Array is serving multiple ESX or Cluster, and even non ESX. • Network has switches, router, firewall, etc which will impact performance.

  28. Sample Architecture This shows an example for Cloud for ~2000 VM. It also uses Active/Passive data centers.

  29. Sample Architecture

  30. The need for IT Cluster • Special purpose cluster • Running all the IT VMs used to manage the virtual DC or provide core services • The Central Management will reside here too • Separated for ease for management & security This separation keeps Business Cluster clean, “strictly for business”.

  31. 3 Tier Server resource pool • Create 3 clusters • The hosts can be identical. • Each project then “leases” vCPU and GB • Not GHz, as speed may vary. • Not using Resource Pool, as we can’t control the #VM in the pool

  32. 3 Tier pools of storage • Create 3 Tiers of Storage. • This become the type of Storage Pool provided to VM • Paves for standardisation • Choose 1 size for each Tier. Keep it consistent. • 20% free capacity for VM swap files, snapshots, logs, thin volume growth, and storage vMotion (inter tier). • Use Thin Provisioning at array level, not ESX level. • Separate Production and Non Production • VMDK larger than 1 TB will be provisioned as RDM. Virtual-compatibility mode used. • Example

  33. Mapping: Cluster - Datastore • Always know which cluster mounts what datastores • Keep the diagram simple. Not too many info. The idea is to have a mental picture that you can remember. • If your diagram has too many lines, too many datastores, too many clusters, then it maybe too complex. Create a Pod when such thing happens. Modularisation can be good.

  34. Performance counters: CPU Same counters are shown for other period, because no real time counters. It does not make sense to see real time.

  35. Performance counters: RAM counters not shown: Memory Capacity Usage

  36. Memory: Consume vs Active • Consumed = how much physical RAM a VM has allocated to it • It does not mean the VM is actively using it. It can be idle page. • Two types of memory overcommitment • “Configured” memory overcommitment • (Sum of VMs’ configured memory size) / host’s mem.capacity.usable* • This is what is usually meant by “memory overcommitment” • “Active” memory overcommitment • (Sum of VMs’ mem.capacity.usage*) / host’s mem.capacity.usable* • Impact of overcommitment • “Configured” memory overcommitment > 1 • zero to negligible VM performance degradation • “Active” memory overcommitment ≈ 1 • very high likelihood of VM performance degradation! *Only available in vSphere 5.0. But net effect is the same. Mapped topRAM Hypervisor consumed

  37. Configured Memory Overcommitment Parts of idle and free memory not in physical RAM due to reclamation VM 1 VM 2 VM 3 free idle active free idle active free idle active Hypervisor All VMs’ active memory stays resident in physical RAM, allowing for maximum VM performance Entitlement >= demand for all VMs [good]

  38. Active Memory Overcommitment No idle and free memory in physical RAM VM 1 VM 2 VM 3 active active active Hypervisor Some VM active memory not in physical RAM, which will lead to VM performance degradation! Entitlement < demand for one or more VMs [bad]

  39. Example • Notice that Active is lower than Consumed and Limit. • VM was doing fine. VM is fighting with ESX for memory Active Limit Consumed

  40. vSphere and RAM • Below is a typical picture. • Most VMware Admin will conclude that ESX is running out of RAM. • Time to buy new RAM • This is misleading. It is showing memory.consumed, not memory.active counter.

  41. vCenter Operation and RAM • Same ESX. vCenter Ops shows 26%. • vCenter Ops is showing the right data

  42. Performance Monitoring

  43. Global view

  44. Thank You And have fun in the pool!

More Related