210 likes | 343 Views
This paper discusses the challenges of microprocessor workload design, emphasizing the selection of representative program-input pairs. It explores how a limited size for testing can affect workload simulation outcomes. By applying multivariate data analysis techniques, including Principal Component Analysis (PCA) and cluster analysis, the study aims to reduce the complexity of workload characterization. The goal is to measure the influence of input datasets on program behavior and to derive insights for selecting effective program-input pairs to optimize performance and minimize redundancy.
E N D
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002, September 23, 2002
Introduction • Microprocessor design: simulation of workload = set of programs + inputs • constrained in size due to time limitation • taken from suites, e.g., SPEC, TPC, MediaBench • Workload design: • which programs? • which inputs? • representative: large variation in behavior • benchmark-input pairs should be “different” PACT 2002
Main idea • Workload design space is p-D space • with p = # relevant program characteristics • p is too large for understandable visualization • correlation between p characteristics • Idea: reduce p-D space to q-D space • with q small (typically 2 to 4) • without losing important information • no correlation • achieved by multivariate data analysis techniques: PCA and cluster analysis PACT 2002
Goal • Measuring impact of input data sets on program behavior • “far away” or weak clustering: different behavior • “close” or strong clustering: similar behavior • Applications: • selecting representative program-input pairs • e.g., one program-input pair per cluster • e.g., take program-input pair with smallest dynamic instruction count • getting insight in influence of input data sets • profile-guided optimization PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
Workload characterization (1) • Instruction mix • int, logic, shift&byte, load/store, control • Branch prediction accuracy • bimodal (8K*2 bits), gshare (8K*2 bits) and hybrid (meta: 8K*2 bits) branch predictor • Data and instruction cache miss rates • Five caches with varying size and associativity PACT 2002
Workload characterization (2) • Number of instructions between two taken branches • Instruction-Level Parallelism • IPC of an infinite-resource machine with only read-after-write dependencies • In total: p = 20 variables PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
PCA • Many program characteristics (variables) are correlated • PCA computes new variables • p principal components PCi • linear combination of original characteristics • uncorrelated • contain same total variance over all benchmarks • Var[PC1] > Var [PC2] > Var[PC3] > … • most have near-to-zero variance (constant) • reduce dimension of workload space to q = 2 to 4 PACT 2002
Interpretation Principal Components (PC) along main axes of ellipse Var(PC1) > Var(PC2) > ... PC2 is less important to explain variation over program-input pairs Reduce No. of PC’s throw out PCs with negligible variance PCA: Interpretation Variable 2 PC 1 PC 2 Variable 1 PACT 2002
Hierarchic clustering Based on distance between program-input pairs Can be represented by a dendrogram Cluster analysis PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
Methodology • Benchmarks • SPECint95 • Inputs from SPEC: train and ref • Inputs from the web (ijpeg) • Reduced inputs (compress) • TPC-D on postgres v6.3 • Compiled with –O4 on Alpha • 79 program-input pairs • ATOM • Instrumentation • Measuring characteristics • STATISTICA • Statistical analysis PACT 2002
GCC: principal components 2 PC’s: 96,9% of total variance PACT 2002
7 inputs GCC High I-cache miss rates High branch prediction accuracy explow High D-cache miss rates Many control & shift insn recog toplev emit-rtl protoize cp-decl expr insn-emit varasm insn-recog reload1 dbxout Many LD/STs and ILP print-tree PACT 2002
ijpeg, compress and go are isolated Workload space: 4 PCs -> 93.1% Go: low branch prediction accuracy Compress: high data cache miss rate Ijpeg: high LD/STs rate, low ctrl ops rate PACT 2002
strong clustering Workload space PACT 2002
Small versus large inputs • Vortex: • Train: 3.2B insn • Ref: 92.5B insn • Similar behavior: linkage distance ~ 1.4 • Not for m88ksim • Linkage distance ~ 4 • Reference input for compress can be reduced without significantly impacting behavior: 2B vs. 60B instructions PACT 2002
Impact of input on behavior • For TPC-D queries: • Weak clustering • Large impact • I-cache behavior • In general: variation between programs is larger than the variation between input sets for the same program • However: there are exceptions where input has large impact on behavior, e.g., TPC-D and perl PACT 2002
Overview • Introduction • Workload characterization • Data analysis • Principal components analysis (PCA) • Cluster analysis • Evaluation • Discussion • Conclusion PACT 2002
Conclusion • Workload design • representative • not long running • Principal Components Analysis (PCA) and cluster analysis help in detecting input data sets resulting in similar or different behavior of a program • Applications: • workload design: representativeness while taking into account simulation time • impact of input data sets on program behavior • profile-guided optimizations PACT 2002