A modeling approach for estimating execution time of long-running Scientific Applications

A modeling approach for estimating execution time of long-running Scientific Applications Seyed Masoud Sadjadi1, Shu Shimizu2, Javier Figueroa1,3, Raju Rangaswami1, Javier Delgado1, Hector Duran4, Xabriel J. Collazo-Mojica5 Presented by: Xabriel J. Collazo-Mojica5 1: Florida International University (FIU), Miami, Florida, USA; 2: IBM Tokyo Research Laboratory, Tokyo, Japan; 3: University of Miami, Coral Gables, Florida, USA; 4: University of Guadalajara, CUCEA, Mexico; 5: University of Puerto Rico, Mayagüez Campus, Puerto Rico Miami, Florida – April 2008

Presentation Outline • Motivation • Research Approach • Research Validation • Related Work • Concluding Remarks • Future Research HPGC '08 - April 14 - LA Grid

Motivation • The impact of hurricanes is devastating • The Weather Research and Forecasting (WRF) model • Most popular • It is computational and storage intensive • We need higher resolution and more precise forecast • Many organizations are willing to share resources • But these resources are dynamic and unpredictable HPGC '08 - April 14 - LA Grid

Motivation • At the time of a hurricane, we need to act fast • What resources should we allocate? • We need to finish in a strict deadline (i.e. on time for hurricane forecast) • In the order of seconds, we need to make a decision • We need to model execution time of WRF based on target resources • In our case: clusters with different parameters HPGC '08 - April 14 - LA Grid

Approach to Modeling Resource Usage WRF Network Latency CPU Speed Hard Disk I/O Number of Nodes Network Bandwidth FSB Bandwidth RAM Size L2 Cache Application Resource Usage Model HPGC '08 - April 14 - LA Grid

Approach to Modeling Execution Parallelism • Platform heterogeneity • We assume identical individual resource characteristics of computation, communication and storage power. • Execution scale • We add a parameter to model the number of nodes utilized during execution. 1 2 3 N … HPGC '08 - April 14 - LA Grid

Application Resource Usage Model • Characterize Applications according to their resource usage characteristics (i.e. application "profiles”) • Assumptions: • Execution time is based on contributors • Product of contributors determines total execution time • Computation nodes are homogeneous (e.g. Beowulf cluster) • Non-ad-hoc application characteristics HPGC '08 - April 14 - LA Grid

Application Resource Usage Model - Contributors • Model aims to allow as many contributors as necessary • This paper focus: 2 contributors • First contributor: Parallelism • Ppara = degree of parallelism • α0= constant contribution • α1 = variable contribution • Second contributor: CPU Performance • Pclock = clock speed of compute node • ß0 = constant contribution related to CPU performance • ß1 = variable contribution related to CPU performance HPGC '08 - April 14 - LA Grid

Experimental Approach - Environment • GCB cluster: Rocks ver. 4.0, 8 nodes, each containing 32-bit x86 Intel 3.0 GHz processors, 1GB of main memory and uses a gigabit network connection • Mind cluster: Rocks ver. 4.0, 16 nodes, each containing dual Xeon 3.6GHz processors, 2GB of main memory and uses gigabit network connection • CPU vs. #-of-NODES:100% to 10% CPU percentages with intervals of 10% • We use CPULimit HPGC '08 - April 14 - LA Grid

Experimental Approach - Monitoring and Prediction • Two tools were used • Amon – A Monitoring Tool • Daemon-like application that collects and reports exploratory variables • Aprof – A Profiling Tool • Statistical Prediction Program • Listens to Amon reports from compute nodes • Stores collected data as matrix for each application HPGC '08 - April 14 - LA Grid

Experimental Approach - Monitoring and Prediction HPGC '08 - April 14 - LA Grid

Application Resource Usage Model - Validation • Intuitive Assumption that execution time decreases linearly with the inverse of total computational power. • Predictions within a cluster (i.e. GCB to GCB) • GCB - FE 5.34% ME 5.86% • Mind - FE 5.66% ME 3.80% • Predictions across clusters • GCB to Mind - FE 9.97% ME 5.86% • Mind to GCB - FE 5.83% ME 4.13% • This results validate our simple model. HPGC '08 - April 14 - LA Grid

Application Resource Usage Model - Mind to GCB prediction HPGC '08 - April 14 - LA Grid

Concluding Remarks • We've proposed a new approach for modeling resource usage and execution time of a distributed application • Experimental results using WRF execution on two different clusters show good accuracy - within 10% from across cluster predictions • Using only two parameters - CPU speed and number of nodes. • WRF specific, we are one step closer to devising a complete solution for our goal of higher-resolution weather predictions and simulations. HPGC '08 - April 14 - LA Grid

Related Work • S. Shimizu, R. Rangaswami, and H. A. Duran-Limon. "Platform-independent Modeling and Prediction of Application Resource Usage Characteristics” • Basis for prediction model • It is limited to one node • D. M. Swany and R. Wolski. “Multivariate Resource Performance Forecasting In the Network Weather Service.” • High-accuracy prediction model • They emphasize latency and bandwidth HPGC '08 - April 14 - LA Grid

Related Work • R. Badia, F. Escale, E. Gabriel , J. Gimenez, R. Keller, J. Labarta, M. S. Müller, Perf. “Prediction in a Grid Environment.” • Offline prediction • Need to link their library to the application to be profiled HPGC '08 - April 14 - LA Grid

Future Research • Extend our parallelism model to address heterogeneous resources. • Include more resource parameters to the model • Started joint research with Barcelona Supercomputing Center • We acknowledge that Amon & Aprof have limitations • We will integrate our tools with their simulation application - DIMEMAS HPGC '08 - April 14 - LA Grid

Acknowledgements • National Science Foundation • REU Grant # IIS-0552555 • PIRE Grant # OISE-0730065 • CREST Grant # HRD-0317692 • GCB Grant # OCI-0636031 • IBM Research • LA Grid • FIU SCIS HPGC '08 - April 14 - LA Grid

Questions?

A modeling approach for estimating execution time of long-running Scientific Applications

A modeling approach for estimating execution time of long-running Scientific Applications

Presentation Transcript

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

Long-distance running

Running Time

Applications of Modeling

RUNNING TIME

Using a Scientific Approach

A Framework for Modeling and Execution of Infrastructure Contention Experiments

Estimating Time of Death

Estimating Time of Death

Scientific Modeling

Running Out of Time

Estimating Time of Death

A mixed model FOR ESTIMATING THE PROBABILISTIC WORST CASE EXECUTION TIME

Exploring Portfolio Scheduling for Long-term Execution of Scientific Workloads in IaaS Clouds

Reliable Estimation of Execution Time of Embedded SW: A Statistical Approach

A Pragmatic Approach for Message Modeling

A Real-Time Linux Execution Environment for Function-Block Based Control Applications

Performance Modeling and Prediction for Scientific Java Applications

Virtual Organization Approach for Running HEP Applications in Grid Environment

A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications

Estimating Running Time

Reliable Estimation of Execution Time of Embedded SW: A Statistical Approach