applying benchmark data to a model for relative server capacity cmg 2013 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Applying Benchmark Data To A Model for Relative Server Capacity CMG 2013 PowerPoint Presentation
Download Presentation
Applying Benchmark Data To A Model for Relative Server Capacity CMG 2013

Loading in 2 Seconds...

play fullscreen
1 / 27

Applying Benchmark Data To A Model for Relative Server Capacity CMG 2013 - PowerPoint PPT Presentation


  • 122 Views
  • Uploaded on

Applying Benchmark Data To A Model for Relative Server Capacity CMG 2013. Joseph Temple, LC-NS Consulting John J Thomas, IBM. Relative Server Capacity. “ How do I compare machine capacity? ” “ What platform is best fit to deliver a given workload? ”

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Applying Benchmark Data To A Model for Relative Server Capacity CMG 2013' - collin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
applying benchmark data to a model for relative server capacity cmg 2013

Applying Benchmark Data To A Model for Relative Server CapacityCMG 2013

Joseph Temple, LC-NS Consulting

John J Thomas, IBM

relative server capacity
CMG 2013Relative Server Capacity
  • “How do I compare machine capacity?”
  • “What platform is best fit to deliver a given workload?”
    • Simple enough questions, but difficult to answer!
  • Establishing server capacity is complex
    • Different platform design points
    • Different machine architectures
    • Continuously evolving platform generations
  • “Standard” benchmarks (SPECInt, TPC-C etc.) and composite metrics (RPE2, QPI etc.) help, but may not be sufficient
    • Some platforms do not support these metrics
    • May not be sufficient to decide best fit for a given workload
  • We need a model to address Relative Server Capacity
    • See “Alternative Metrics for Server RFPs” [J. Temple]
slide3
CMG 2013

Local Factors /

Constraints

Cost

Models

Non-Functional

Requirements

System

z

System

x

Power

Strategic

Direction

Workload

Fit

Technology

Adoption

Reference

Architectures

fit for purpose workload types
CMG 2013Fit for Purpose Workload Types

Mixed Workload – Type 1

Parallel Data Structures – Type 3

  • Scales up
  • Updates to shared data and work queues
  • Complex virtualization
  • Business Intelligence with heavy data sharing and ad hoc queries
  • Scales well on clusters
  • XML parsing
  • Buisness intelligence with Structured Queries
  • HPC applications

Application Function Data Structure Usage Pattern SLA Integration Scale

Highly Threaded – Type 2

Small Discrete – Type 4

  • Scales well on large SMP
  • Web application servers
  • Single instance of an ERP system
  • Some partitioned databases
  • Limited scaling needs
  • HTTP servers
  • File and print
  • FTP servers
  • Small end user apps

Black are design factors Blue are local factors

fitness parameters in machine design
CMG 2013Fitness Parameters in Machine Design

Can be customized to machines of interest. Need to know specific comparisons desired

These parameters were chosen to represent the ability to handle parallel, serial and bulk data traffic.

This is based on Greg Pfister’s work on workload characterization in In Search of Clusters

key aspects of the theoretical model
CMG 2013Key Aspects Of The Theoretical Model
  • Throughput (TP)
    • Common concept: Units of Work Done / Units of Time Elapsed
    • Theoretical model defines TP as a function of Thread Speed: TP = Thread Speed x Threads
      • Thread Speed is calculated as clock rate x Threading multiplier / Threads per Core. Threading multiplier is the increase in throughput due to multiple threads per core
  • Thread Capacity (TC)
    • Throughput (TP) gives us an idea of instantaneous peak throughput rate
      • In order to sustain this rate the load has to keep all threads of the machine busy
    • In the world of dedicated systems, TP is the parameter of interest because it tells us the peak load the machine can handle without causing queues to form
    • However in the world of virtualized/consolidated workloads, we are stacking multiple workloads on threads of the machine
      • Thread capacity is an estimator of how deep these stacks can be
    • Theoretical model defines TC as: TC = Thread Speed x Cache per Thread
throughput saturation capacity
CMG 2013Throughput, Saturation, Capacity

TP

Measured ITR

Capacity

7

TP  Pure Parallel CPU

ITR  Other resources and Serialization

ETR  Load and Response Time

single dimension metrics do not reflect true capacity
CMG 2013Single Dimension Metrics Do Not Reflect True Capacity

Common Metrics:

ITR  TP

ETR  ITR

Power advantaged

z is not price competitive

Consolidation:

ETR << ITR unless loads are consolidated

Consolidation accumulates working sets Power and z advantaged

Cache can also mitigate “Saturation”

The “standard metrics” do not leverage cache.

This leads to the pure ITR view of relative capacity on the right.

bridging two worlds i
CMG 2013Bridging Two Worlds - I
  • There appears to be a disconnect between “common benchmark metrics” and “theoretical model metrics” like TP
  • Does this mean metrics like TP are invalid? No
    • We see the effect of TP/TC in real world deployments
      • a machine performs either better or poorer than what a common benchmark metric would have suggested
  • Does this mean benchmark metrics are useless? No
    • They provide valuable data points
  • A better approach would be to try and bridge these two worlds in a meaningful way
bridging two worlds ii
CMG 2013Bridging Two Worlds - II
  • Theoretical model calculates TP and TC using estimated values for thread speed
    • Based on machine specifications
  • Example: TP calculation for POWER7
    • A key factor in TP calculation is Thread Speed, which in turn depends on the value of the thread multiplier
      • But this factor is only an estimate.
      • We estimated the thread multiplier for POWER7 in SMT-4 mode was 2
    • However, using an estimate for thread speed assumes common path length and linear scaling
    • An inherent problem here – these estimates are not measured or specified using any common metric across platforms
      • As an example, should the thread multiplier be the same for POWER7 in SMT-2 mode as Intel running with HyperThreading?
  • Recommendation: Refine factors in the theoretical model with benchmark results
    • Instead of using theoretical values for thread speed, pathlength etc., plug in benchmark observations
two common categories of benchmarks
CMG 2013Two Common Categories Of Benchmarks
  • Stress tests
    • Measure raw throughput
      • Measure the maximum throughput that can be driven through a system, focusing all system resources to this particular task
  • VM density tests
    • Consolidation ratios (VM density) that can be achieved on a platform
    • Usually do not try to maximize throughput of a system
      • They usually look at how multiple workloads can be stacked efficiently to share the resources on a system, while delivering steady throughput
  • Adjusting Thread Speed affects both TP and TC
example of a stress test a misleading one if used in isolation
CMG 2013Example of a Stress Test, A Misleading One If Used In Isolation!
  • This benchmark result is quite misleading, it suggests a z core yields only 15% better ITR. But we know that z has much higher “capacity”
  • What is wrong here?
    • System z design point is to run multiple workloads together, not a single atomic application under stress
    • This particular application doesn’t seem to leverage many of z’s capabilities (cache, IO etc.)
  • Can this benchmark result be used to compare capacity?

2ch/16co Intel 2.7GHz Blade

Peak ITR:3467 tps

TradeLite workload

Online trading WAS ND workload driven as a stress test

Peak ITR:3984 tps

Linux on System z

16 IFLs

use benchmark data to refine relative capacity model
CMG 2013Use Benchmark Data To Refine Relative Capacity Model
  • Calculate Effective thread speed from measured values
    • What is the benchmarked thread speed?
    • Normalizing thread speed and clock to a platform allows us to calculate pathlength for a given platform
    • This in turn allows us to calculate Effective thread speed
  • Doing this affects both TP and TC
  • Plug in Effective thread speed values into Relative Capacity calculation model
use benchmark data to refine relative capacity model results
CMG 2013Use Benchmark Data To Refine Relative Capacity Model - Results

ITR / Threads

Clock ratio / Threadspeed ratio

Effective Threadspeed * Total Threads * Cache/Thread

In this case, System z ends up with a 13.5x Relative Capacity factor, relative to Intel

example of a vm density test consolidating standalone vms with light cpu requirements
CMG 2013

Light workloads

Example of a VM Density Test: Consolidating Standalone VMs With Light CPU Requirements

Common x86 hypervisor

2ch/16co Intel 2.7GHz Blade

48 VMs

per IPAS Intel blade

PowerVM

2ch/16co POWER7+ 3.6GHz Blade

68 VMs

per IPAS POWER7+ blade

Online banking WAS ND workloads, each driving 22 transactions per second with light I/O

100 VMs

per 16-way z/VM

z/VM on zEC12

16 IFLs

Consolidation ratios derived from IBM internal studies. Results will vary based on workload profiles/characteristics.

use benchmark data to refine relative capacity model results1
CMG 2013Use Benchmark Data To Refine Relative Capacity Model - Results
  • Follow a similar exercise to calculate effective thread speed
    • Each VM is driving a certain fixed throughput
      • This test used a constant injection rate
      • If throughput varies (for example, holding a constant think time), need to adjust for that
    • Calculate benchmarked thread speed
    • Normalize to a platform to get path length
    • Calculate effective thread speed
    • Plug into relative server capacity calculation

In this case, System z ends up with a 22.2x Relative Capacity factor relative to Intel

math behind consolidation
CMG 2013Math Behind Consolidation

Roger’s Equation:

Uavg = 1/(1+HR(avg))

Where

HR(avg) = kcN1/2

For consolidation, N is the number of loads (VMs)

k is a design parameter (Service Level)

c is the variability of the initial load

larger servers with more resources make more effective consolidation platforms
CMG 2013Larger Servers With More Resources Make More Effective Consolidation Platforms
  • Most workloads experience variance in

demand

  • When you consolidate workloads with variance on a virtualized server, the variance of the sum is less (statistical multiplexing)
  • The more workloads you can consolidate, the smaller is the variance of the sum
  • Consequently, bigger servers with capacity to run more workloads can be driven to higher average utilization levels without violating service level agreements, thereby reducing the cost per workload
a single workload requires a machine capacity of 6x the average demand
CMG 2013A Single Workload Requires a Machine Capacity Of 6x the Average Demand

Server utilization = 17%

Server Capacity Required

60/sec

6x Peak To Average

Average Demand

m=10/sec

Assumes coefficient of variation = 2.5, required to meet 97.7% SLA

consolidation of 4 workloads requires server capacity of 3 5x average demand
CMG 2013Consolidation Of 4 Workloads Requires Server Capacity Of 3.5x Average Demand

Server utilization = 28%

Server Capacity Required

140/sec

3.5x Peak To Average

Average Demand

4*m =

40/sec

Assumes coefficient of variation = 2.5, required to meet 97.7% SLA

consolidation of 16 workloads requires server capacity of 2 25x average demand
CMG 2013Consolidation Of 16 Workloads Requires Server Capacity Of 2.25x Average Demand

Server utilization = 44%

Server Capacity Required

360/sec

2.25x Peak To Average

Average Demand 16*m =

160/sec

Assumes coefficient of variation = 2.5, required to meet 97.7% SLA

consolidation of 144 workloads requires server capacity of 1 42x average demand
CMG 2013Consolidation Of 144 Workloads Requires Server Capacity Of 1.42x Average Demand

Server utilization = 70%

1.42x Peak To Average

Server Capacity Required

2045/sec

Average Demand 144*m = 1440/sec

Assumes coefficient of variation = 2.5, required to meet 97.7% SLA

let s look at actual customer data
CMG 2013Let’s Look At Actual Customer Data
  • Large US insurance company
  • 13 Production POWER7 frames
    • Some large servers, some small servers
  • Detailed CPU utilization data
    • 30 minute intervals, one whole week
    • For each LPAR on the frame
    • For each frame in the data center
  • Measure peak, average, variance
customer data confirms theory
CMG 2013Customer Data Confirms Theory

Servers with more workloads have less variance in their utilizationand less headroom requirements

consolidation observations
CMG 2013Consolidation Observations
  • There is a benefit to large scale servers
    • The headroom required to accommodate variability goes up only by sqrt(n) when n workloads are pooled
    • The larger the shared processor pool is, the more statistical benefit you get
    • Large scale virtualization platforms are able to consolidate large numbers of virtual machines because of this
  • Servers with capacity to run more workloads can be driven to higher average utilization levels without violating service level agreements
summary
CMG 2013Summary
  • We need a theoretical model for relative server capacity comparisons
  • Purely theoretical models need to be grounded in reality
  • Atomic benchmarks can sometimes be quite misleading in terms of overall system capability
  • Refine theoretical models with benchmark measurements
  • Real world (customer) data trumps everything!
    • Validates or negates models
    • Customer data validates sqrt(n) model for consolidation