Understanding performance monitoring in vmware vi3 based on vmworld 2007 ta64
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Understanding Performance Monitoring in VMware VI3 (based on VMworld 2007 – TA64) PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Understanding Performance Monitoring in VMware VI3 (based on VMworld 2007 – TA64). Sources of Performance Problems. Bottom Line: virtual machines share physical resources Over-commitment of resources can lead to poor performance

Download Presentation

Understanding Performance Monitoring in VMware VI3 (based on VMworld 2007 – TA64)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Understanding performance monitoring in vmware vi3 based on vmworld 2007 ta64

Understanding Performance Monitoring in VMware VI3(based on VMworld 2007 – TA64)


Sources of performance problems

Sources of Performance Problems

  • Bottom Line: virtual machines share physical resources

  • Over-commitment of resources can lead to poor performance

  • Imbalances in the system can also lead to slower-than-expected performance

  • Configuration issues can also contribute to poor performance


Tools for performance monitoring analysis

Tools for Performance Monitoring/Analysis

  • VirtualCenter client (VI client): per-host stats and per-cluster statistics

  • Esxtop: per-host statistics

  • SDK: allows users to collect only the statistics they want

  • All tools use same mechanism to retrieve data (special vmkernel calls)


Vi client screenshot

VI Client screenshot

Chart Type

Real-time vs. Historical

Object

Counter type

Stats type

Rollup


Real time vs historical stats

Real-time vs. Historical stats

  • VirtualCenter stores statistics at different granularities

Samples are “rolled up” (averaged) for next time interval


Stacked vs line charts

Stacked vs. Line charts

  • Line

    • Each instance shown separately

  • Stacked

    • Graphs are stacked on top of each other

    • Only applies to certain kinds of charts, e.g.:

      • Breakdown of Host CPU MHz by Virtual Machine

      • Breakdown of Virtual Machine CPU by VCPU


Esxtop

Esxtop

Host information

Per-VM/world information

Per-host statistics

Shows CPU/Memory/Disk/Network on separate screens

Sampling frequency (refresh every 5s by default)

Batch mode (can look at data offline with perfmon)


Presentation 1252232

SDK

  • Use the VIM API to access statistics relevant to a particular user

  • Can only access statistics that are exported by the VIM API (and thus are accessible via esxtop/VI client)


Shares example

Shares example

  • Change shares for VM

  • Dynamic reallocation

  • Add VM, overcommit

  • Graceful degradation

  • Remove VM

  • Exploit extra resources


Esx cpu scheduling

ESX CPU Scheduling

  • World states (simplified view):

    • ready = ready-to-run but no physical CPU free

    • run = currently active and running

    • wait = blocked on I/O

  • Multi-CPU Virtual Machines => gang scheduling

    • Co-run (latency to get vCPUs running)

    • Co-stop (time in “stopped” state)


Esx virtualcenter and resource pools

ESX, VirtualCenter, and Resource pools

  • Resource Pool extends proportional-share scheduling to groups of hosts (a cluster)

  • VirtualCenter can VMotion VMs to provide resource balance (DRS)


Tools for monitoring cpu performance vi client

Tools for monitoring CPU performance: VI Client

  • Basic stuff

    • CPU usage (percent)

    • CPU ready time (but ready time by itself can be misleading)

  • Advanced stuff

    • CPU wait time: time spent blocked on IO

    • CPU extra time: time given to virtual machine over reservation

    • CPU guaranteed: min CPU for virtual machine

  • Cluster-level statistics

    • Percent of entitled resources delivered

    • Utilization percent

    • Effective CPU resources: MHz for cluster


  • Vi client cpu screenshot

    VI Client CPU screenshot

    Note CPU milliseconds and percent are on the same chart but use different axes


    Cluster level information in the vi client

    Cluster-level information in the VI Client

    • Utilization % describes available capacity on hosts (here: CPU usage low, memory usage medium)

    % Entitled resources delivered: best if all 90-100+.


    Cpu performance analysis esxtop

    CPU performance analysis: esxtop

    • PCPU(%): CPU utilization

    • Per-group stats breakdown

      • %USED: Utilization

      • %RDY: Ready Time

      • %TWAIT: Wait and idling time

    • Co-Scheduling stats (multi-CPU Virtual Machines)

      • %CRUN: Co-run state

      • %CSTOP: Co-stop state

    • Nmem: each member can consume 100% (expand to see breakdown)

    • Affinity

    • HTSharing


    Esxtop cpu screenshot

    esxtop CPU screenshot

    2-CPU box, but 3 active VMs (high %used)

    High %rdy + high %used can imply CPU overcommitment


    Vi client and ready time

    VI Client and Ready Time

    Ready time < used time

    Used time

    Used time ~ ready time: may signal contention. However, might not be overcommitted due to workload variability

    In this example, we have periods of activity and idle periods: CPU isn’t overcommitted all the time

    Ready time

    ~ used time


    Memory

    Memory

    • ESX must balance memory usage for all worlds

      • Virtual machines, Service Console, and vmkernel consume memory

      • Page sharing to reduce memory footprint of Virtual Machines

      • Ballooning to relieve memory pressure in a graceful way

      • Host swapping to relieve memory pressure when ballooning insufficient

    • ESX allows overcommitment of memory

      • Sum of configured memory sizes of virtual machines can be greater than physical memory if working sets fit

    • VC adds the ability to create resource pools based on memory usage


    Vi client

    VI Client

    • Main page shows “consumed” memory (formerly “active” memory)

    • Performance charts show important statistics for virtual machines

      • Consumed memory

      • Granted memory

      • Ballooned memory

      • Shared memory

      • Swapped memory

        • Swap in

        • Swap out


    Virtual machine memory metrics

    Virtual Machine Memory Metrics


    Vi client vm list summary

    VI Client: VM list summary

    Host CPU: avg. CPU utilization for Virtual Machine

    Host Memory: consumed memory for Virtual Machine

    Guest Memory: active memory for guest


    Esxtop for memory host information

    Esxtop for memory: Host information

    PMEM: Total physical memory breakdown

    VMKMEM: Memory managed by vmkernel

    COSMEM: Service Console memory breakdown

    SWAP: Swap breakdown

    MEMCTL: Balloon information


    Vi client memory example for virtual machine

    VI Client: Memory example for Virtual Machine

    Increase in swap activity

    No swap activity

    Swap in

    Balloon & target

    Swap out

    Consumed & granted

    Active memory

    Swap usage


    Presentation 1252232

    Disk

    • Disk performance is dependent on many factors:

      • Filesystem performance

      • Disk subsystem configuration (SAN, NAS, iSCSI, local disk)

      • Disk caching

      • Disk formats (thick, sparse, thin)

  • ESX is tuned for Virtual Machine I/O

  • VMFS clustered filesystem => keeping consistency imposes some overheads


  • Esx storage stack

    ESX Storage Stack

    • Different latencies for local disk vs. SAN (caching, switches, etc.)

    • Queuing within kernel and in hardware


    Vi client disk statistics

    VI Client disk statistics

    • Mostly coarse-grained statistics

    • Disk bandwidth

      • Disk read rate, disk write rate (KB/s)

      • Disk usage: sum of read BW and write BW (KB/s)

    • Disk operations during sampling interval (20s for real-time)

      • Disk read requests, disk write requests, commands issued

      • Bus resets, command aborts

    • Per-LUN and aggregated statistics


    Esxtop disk statistics

    K: Kernel, D: Device, G: Guest

    Esxtop Disk Statistics

    • Aggregated statistics like VI Client

      • READS/s, WRITES/s, MBREAD/s, MBWRITE/s

  • Latency statistics

    • KAVG/cmd, DAVG/cmd, GAVG/cmd

  • Queuing information

    • Adapter (AQLEN), LUN (LQLEN), vmkernel (QUED)


  • Disk performance example vi client

    Disk performance example: VI Client

    Throughput with

    Cache (good)

    Throughput w/o

    Cache (bad)


    Disk performance example esxtop

    Latency seems high

    Disk performance example: esxtop

    After enabling cache,

    latency much better


    Network

    Guest OS

    TCP/IP

    virtual driver

    Virtual Device

    Virtual I/O stack

    physical

    driver

    Network

    • Guest to vmkernel

      • Address space switch

      • Virtual interrupts

  • Virtual I/O stack

    • Packet copy

    • Packet routing


  • Vi client networking statistics

    VI Client Networking Statistics

    • Mostly high-level statistics

      • Bandwidth

        • KBps transmitted, received

        • Network usage (KBps): sum of TX, RX over all NICs

      • Operations/s

        • Network packets received during sampling interval (real-time: 20s)

        • Network packets transmitted during sampling interval

    • Per-adapter and aggregated statistics

    • Per VM Stacked Graphs


    Esxtop networking statistics

    Esxtop Networking Statistics

    • Bandwidth

      • Receive (MbRX/s), Transmit (MbRX/s)

    • Operations/s

      • Receive (PKTRX/s), Transmit (PKTTX/s)

    • Configuration info

      • Duplex (FDUPLX), speed (SPEED)

    • Errors

      • Packets dropped during transmit (%DRPTX), receive (%DRPRX)


    Esxtop network output

    Esxtop network output

    Setup A (10/100):

    Physical configuration

    Performance

    Setup B (GigE):


    Approaching performance issues

    Approaching Performance Issues

    • Make sure it is an apples-to-apples comparison

    • Check guest tools & guest processes

    • Check host configurations & host processes

    • Check VirtualCenter client for resource issues

    • Check esxtop for obvious resource issues

    • Examine log files for errors

    • If no suspects, run microbenchmarks (e.g., Iometer, netperf) to narrow scope

    • Once you have suspects, check relevant configurations

    • If all else fails…contact VMware


  • Login