180 likes | 247 Views
Learn how to optimize system balance to achieve the best cost-performance ratio for your applications. Explore performance and cost models, composite figures of merit, and optimal system balance calculations.
E N D
Optimum System Balance for Systems of Finite Price John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2004 Pittsburgh, PA November 10, 2004
Overview • The HPC Challenge Benchmark was announced last year at SuperComputing’2003 • The HPC Challenge Benchmark consists of • LINPACK (HPL) • STREAM • PTRANS (transposing the array used by HPL) • RandomAccess (random read/modify/write) • FFT • and some low-level MPI latency & BW measurements • No single figure of merit is defined
Overview (continued) • Q: What is a “balanced” system? • My answer:“A balanced system is one for which the primary applications are limited in performance by the most expensive component(s) of the system.”
The Two Questions • We need to understand both performance and cost in the context of low-level component metrics • Performance • What performance model? • Use a harmonically weighted, time-based model • Cost • What cost model? • Simple linear additive cost model
Performance Model • Composite Figures of Merit for Performance must be based on “time” rather than “rate” • i.e., weighted harmonic means of rates • Why? • Combining “rates” in any other way fails to have a “Law of Diminishing Returns” • Time = Work/Rate • Repeat for each component: Ti = Wi/Ri
A Simple Composite Model • Assume the time to solution is composed of a compute time proportional to peak GFLOPS plus a memory transfer time proportional to sustained memory bandwidth • Assume “x Bytes/FLOP” to get: • Target SPECfp_rate2000 as the workload
Optimized Model Results • Results rounded to nearby round values: • Bytes/FLOP for large caches === 0.16 • Bytes/FLOP for small caches === 0.80 • Size of asymptotically large cache === ~12 MB • Coefficient of best fit === ~6.4 • The units of the coefficient are SPECfp_rate2000 / Effective GFLOPS
Cost Model • Assume simple linear additive model • FLOPS cost some amount • Sustained BW costs a different amount • Define:b = Rmem / Rcpug = Wmem / Wcpud = ($/BW) / ($/FLOPS) System characteristics Application characteristics Technology characteristics
How to Optimize? • For a given application g (Wmem/Wcpu), what is the optimum system balance b? • Many people seem to believe that the system should be “balanced” to the application: • boptimal = g i.e., • Rmem/Rcpu = Wmem/Wcpu • This does not optimize price/performance
The Correct Optimization • This is actually an easy optimization problem • Minimize cost/performance • Same as minimizing cost * time • Optimum cost/performance occurs at • b = sqrt(g/d) • Definitely not intuitive!
Example: High BW, expensive BW g = 3 relatively high BW d = 3 relatively expensive BW Optimum price/performance is at b=1, not b=3 Improvement in price/performance is ~30%
High BW, very expensive memory g = 3 relatively high BW d = 10 very expensive BW Optimum price/performance is at b=0.58, not b=3 Improvement in price/performance is ~50%
Low-BW, expensive BW g = 0.1 low BW application d = 3 moderately expensive BW Optimum price/performance is at b=0.18, not b=0.1 Improvement in price/performance is ~5% More BW helps here even though it is expensive, because the application does not need much.
Medium BW, expensive BW g = 1 modest BW d = 3 moderately expensive BW Optimum price/performance is at b=0.58, not b=1 Improvement in price/performance is ~10%
Summary • Balance is important to cost/performance • You must understand performance • You must understand cost • Optimum cost-performance is not intuitive!