Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
UAB PowerPoint Presentation

UAB

116 Views Download Presentation
Download Presentation

UAB

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Paradyn/Condor Week 2005 March 2005 UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat Autònoma de Barcelona

  2. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  3. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  4. Introduction Application performance • The main goal of parallel/distributed applications: solve a considered problem in the possible fastest way • Performance is one of the most important issues • Developers must optimize application performance to provide efficient and useful applications

  5. Introduction (II) • Difficulties in finding bottlenecks and determining their solutions for parallel/distributed applications • Many tasks that cooperate with each other • Application behavior may change on input data or environment • Difficult task especially for non-expert users

  6. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  7. Problem / Solution Application development Source User Application Execution Performance data Monitoring Tuning Tool Events Performance analysis MATE • Monitoring, Analysis and Tuning Environment • Dynamic automatic tuning of parallel/distributed applications Modifications DynInst Instrumentation

  8. MATE (II) Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  9. MATE (II) • Analyzer • Carries out the application performance analysis • Detects problems “on the fly” and requests changes Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  10. MATE (II) Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. • Application Controller (AC) • Controls the execution of the application • Has a Monitor module to manage instrumentation via DynInst and gather execution information • Has a Tuner module to perform tuning via DynInst events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  11. MATE (II) • Dynamic Monitoring Library (DMLib) • Facilitates the instrumentation and data collection • Responsible for registration of events Machine 1 Machine 2 pvmd pvmd modif. AC AC Task1 Task3 Task2 DMLib DMLib DMLib instr. instr. events • Application Controller - AC • Dynamic Monitoring Library - DMLib • Analyzer events Machine 3 Analyzer

  12. MATE (III) • Automatic performance Analysis on the fly • Find bottlenecks among events applying performance model • Find solutions that overcome bottlenecks • Analyzer is provided with an application knowledge about performance problems • Information related to one problem is called a tuning technique • A tuning technique describes a complete performance optimization scenario

  13. Analyzer Tunlet Performance model Measure points Tuning point, action, sync MATE (IV) • Each tuning technique is implemented in MATE as a “tunlet” • A tunlet is a C/C++ library dynamically loaded to the Analyzer process • measure points – what events are needed • performance model – how to determine bottlenecks and solutions • tuning actions/points/synchronization - what to change, where, when

  14. thread Events (from DMLibs) via TCP/IP MetaData (from ACs) via TCP/IP Tuning request (to tuner) via TCP/IP Event Collector Controller AC Proxy Instrument. request (to monitor) via TCP/IP Event Repository DTAPI Application model Tunlet Tunlet Tunlet MATE (V)

  15. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  16. Master Worker Worker Worker Worker Number of Workers • Master/Worker paradigm • Easy to understand concept, but with some bottlenecks • Example: inadequate number of workers • - workers  master idle • + workers  + communication

  17. iftl > + then else Number of Workers (II) • Execution Trace of an Homogeneous Master-Worker Application • (where are homogeneous: • message size • workers execution time) Master Workers Where... tl = latency λ = inverse bandwidth vi = size of tasks sent to worker i, in bytes. n = current number of workers in the application.

  18. Master Workers tci Number of Workers (II) • Execution Trace of an Homogeneous Master-Worker Application • (where are homogeneous: • message size • workers execution time) Where... tci = time that worker i spends processing a task

  19. Master Workers tl + λ*vm Number of Workers (II) • Execution Trace of an Homogeneous Master-Worker Application • (where are homogeneous: • message size • workers execution time) Where... tl = latency λ = inverse bandwidth vm = size of results sent back to master

  20. Number of Workers (III)

  21. Number of Workers (IV)

  22. Machi ne A (master) Machine B (worker) send (entry) receive (entry) send (exit) receive (exit) send (entry) receive (entry) send (exit) receive ( exit ) time time Number of Workers: Tunlet • Measure points: • The amount of data sent to the workers and received by the master • The total computational time of workers • The network overhead and bandwidth

  23. Number of Workers: Tunlet (II) • Performance function: • Calculation of the optimal number of workers: • Tuning actions: • To change the value of “numworkers” to add or remove as many workers as is needed

  24. Experimentation • Example application • Forest Fire Propagation simulator – Xfire • Intensive computing application Master/Worker • Simulation of the fireline propagation • Calculates the next position of the fireline considering the current fireline position and weather factors, vegetation,etc. • Platform • Cluster of Pentium 4, 1.8Ghz, SuSE Linux 8.0, connected by 100Mb/sec network

  25. Experimentation (II) • Load in the system • We designed different external load patterns • They simulate the system’s time-sharing • Allow us to reproduce experiments • Case Studies • Xfire executed with different fixed number of workers without any tuning, introducing external loads • Xfire executed under MATE, introducing external loads

  26. Starts with 1 worker and adapts it 1400 1200 1000 800 Execution time (Sec.) 600 400 200 0 1 2 4 6 8 10 12 14 16 18 20 22 24 26 Xf+MATE Case studies Experimentation (III) • Note that... • Execution time of Xfire under MATE is close to the best execution times obtained. • Resources devoted to the application using MATE, are used when they are really needed.

  27. Experimentation (IV) • Statically, the model fits • Dynamically, there are some problems • Nopt Could be extremely high • Computation power added or removed may be not significant considering the previous computational power • Solution • Finding a “reasonable” number of workers that define a trade off between resources utilization and execution time.

  28. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  29. Master Workers Data Distribution • Imbalance Problem: • Heterogeneous computing and communication powers • Varying amount of distributed work Unbalanced iteration Balanced iteration

  30. Data Distribution (II) • Goal: • minimize the idle time by balancing the work among the processes considering efficiency of machines • Performance Model • Factoring Scheduling method • Work is divided into different-size tuples according to the factor

  31. Data Distribution: Tunlet • Measure points: • The work unit processing time. • The latency and bandwidth • Performance function: • Calculation of the factor. • Analyzer simulates the execution considering different factors. Finally, it decides the best factor. • Currently we are working on an analytical model to determine the factor • Tuning actions: • To change the value of “TheFactorF”

  32. Experimentation • Example application • Forest Fire Propagation simulator – Xfire • Platform • Cluster of Pentium 4, 1.8Ghz, SuSE Linux 8.0, connected by 100Mb/sec network

  33. Experimentation (II) • Load in the system • We designed different external load patterns • They simulate the system’s time-sharing • Permit us to reproduce experiments • Study Cases • Xfire executed without any tuning • Xfire, introducing controlled variable external loads • Xfire executed under MATE, introducing variable external loads

  34. 18000 16000 14000 12000 10000 8000 Execution time (Sec.) 6000 4000 2000 0 Xfire 1 2 4 8 16 30 Xfire+Load Number of Workers Xfire+Load+MATE Experimentation (III) • Note that… • Introduction of an extra load increases the execution time. • Execution with MATE corrects the factor value to improve the execution time

  35. Outline • Introduction • MATE • Number of workers • Data distribution • Conclusions

  36. Conclusions and open lines • Conclusions • Prototype environment – MATE – automatically monitors, analyses and tunes running applications • Practical experiments conducted with MATE and parallel/distributed applications prove that it automatically adapts application behavior to existing conditions during run time • MATE in particular is able to tune Master/Worker applications and overcome the possible bottlenecks: number of workers and data distribution • Dynamic tuning works, is applicable, effective and useful in certain conditions.

  37. Conclusions and open lines • Open Lines • Determining the “reasonable” number of workers. • Considering interaction between different tunlets. • Providing the system with other tuning techniques.

  38. Thank you…