1 / 19

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS. Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston. Resource Selection for Network/Grid Applications. Model. Data. GUI. Sim 1. Pre. Stream. Application. ?. where is the best performance . Network.

wilda
Download Presentation

SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston

  2. Resource Selection for Network/Grid Applications Model Data GUI Sim 1 Pre Stream Application ? where is the best performance Network

  3. Current approaches to Node Selection Model Data GUI Sim 1 Pre Stream • 1. Measure and model network properties, such as available bandwidth and CPU loads (with tools like NWS) • 2. Find “best” nodes for execution based on network status • But expected application performance based on measured resource status may not be accurate • depends on application characteristics – hard to model • translation, e.g., unused bandwidth vs expected throughput • data may be stale as frequent measurements are expensive

  4. Our Approach Application Model Data GUI Sim 1 Pre Stream Network PREDICT APPLICATION PERFORMANCE BY RUNNING A SMALL PROGRAM REPRESENTATIVE OF ACTUAL DISTRIBUTED APPLICATION

  5. Performance Skeleton Performance Skeleton is a synthetic short running program whose execution characteristics mirror the application it represents An application and its skeleton have similar • communication pattern • CPU usage • memory usage • synchronization pattern Goal: Performance of a skeleton is directly related to the performance of the application under any condition • e.g., a skeleton executes in .1% of the time the application takes to execute on any part of a shared network

  6. Central Contribution of This Paper Model Data GUI Sim 1 Pre Stream CREATE SKELETON Application Skeleton Model Data GUI Sim 1 Pre Stream Framework for Automatic Construction of Performance Skeletons

  7. Automatic Construction of Skeletons Model Data GUI Sim 1 Pre Stream CREATE SKELETON Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature

  8. Model Data GUI Sim 1 Pre Stream CREATE SKELETON Automatic Construction of Skeletons Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature

  9. Recording of Execution Trace • Implemented for MPI applications • Link MPI application with PMPI based profiling library • no source code modification / analysis required • Execute on a dedicated testbed • Records all MPI function calls • Call name, start time, stop time, parameters passed • Timing done to microsecond granularity • CPU busy = time between two consecutive MPI calls

  10. Model Data GUI Sim 1 Pre Stream CREATE SKELETON Automatic Construction of Skeletons Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature

  11. Generation of Execution Signature …1 Application execution typically follows cyclic patterns Goal: Determine cyclic patterns and form loop structure by identifying repeating execution behavior. • Repeating patterns should be broadly similar Step 1:Execution trace to symbol strings • Cluster similar execution events • Replace all events in cluster by average event • Each cluster is then assigned a unique symbol • Execution trace is replaced by string of symbols: ,,,,,,,,,,, , ,,, , ,,, …

  12. Generation of Execution Signature …2 Step 2: Compress string by Identifying Cycles • Similar to longest substring matching problem • Algorithm builds loop structure recursively from symbol strings e.g. ,,,,,,,,,,, , ,,, , ,,, isreplaced by [,,]4,[,[]2,]2 • Typically signature is multiple orders of magnitude smaller than trace Step 3: Adaptively increase degree of clustering • until signature is compact enough

  13. Model Data GUI Sim 1 Pre Stream CREATE SKELETON Automatic Construction of Skeletons Application Skeleton Model Data GUI Sim 1 Pre Stream Construct skeleton program from execution signature Record Execution Trace Compress execution trace into execution signature

  14. Generate Performance Skeleton Program Goal:Execution time of performance skeleton should be a fixed factor K less than application execution time Reduce Iterations of each loop by a factor K • Add remainder iterations to events outside of all loops Process events outside loop as follows: • Reduce execution time of compute operations by a factor K • Reduce execution time of message exchanges by reducing bytes exchanged by a factor K • Communication operations not scaled linearly due to latency. • Considering latency would make approach architecture-specific Replace symbols by C language statements

  15. Experimental Validation Skeletons constructed for Class B NAS MPI benchmarks are executed in following sharing scenarios • Competing processes on one node • Competing processes on all nodes • Competing traffic on one link • Competing traffic on all links • Competing process and traffic on one node and link Skeleton execution time is used to predict application execution time. Setup: Intel Xeon dual CPU 1.7 GHz nodes running Linux 2.4.7. Gigabit crossbar switch. iproute to simulate link sharing

  16. Prediction Accuracy Graph shows error between predicted and measured application execution time Skeleton execution is 1/10th of Application execution average error: 6% max error 18% Error is higher for scenarios with competing traffic

  17. Comparison with other methods Average Prediction: Average slowdown of entire benchmark is used to predict execution time for each program. Class S Prediction: Class S benchmark(~1sec) programs used as skeletons for Class B (30-900s)benchmarks

  18. Preliminary Conclusions Performance estimation with skeleton has high accuracy Need to incorporate memory access patterns and fine grain CPU behavior for execution across architectures Implementation limited to mpi applications • basic approach should work for other paradigms Skeletons may have other uses as a fast way of estimating application performance • e.g. on a slow simulated future system

  19. Questions Contact jaspal@uh.edu ssodhi@microsoft.com

More Related