Cross Cluster Migration - PowerPoint PPT Presentation

cross cluster migration n.
Skip this Video
Loading SlideShow in 5 Seconds..
Cross Cluster Migration PowerPoint Presentation
Download Presentation
Cross Cluster Migration

play fullscreen
1 / 32
Cross Cluster Migration
Download Presentation
Download Presentation

Cross Cluster Migration

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc

  2. Outline • Introduction • Dynamic load balancing, process migration • Dynamite • Problem formulation, Dynamite towards the Grid • GASS • How it works, operations supported • Design Considerations • Testing and Measurements • Stability test, simple performance measurements • Summary

  3. Static task load Dynamic task load Static task allocation Predictable reallocation Dynamical reallocation Static resource load Dynamic resource load Dynamic Load Balancing • Load balancing needed for parallel applications in dynamically changing environment • Re-allocation of load through process migration

  4. Migrating Process Source Instance Migrating Process Destination Instance Process Migration • Moving a process between two machines during execution • Extract state of source node (checkpointing), • Transfer the state to the destination • Update connections with other processes. Migrating Process Source Instance Communicating Process

  5. Dynamite (Dynamic Task Migration Environment) • User level Checkpointing, implemented in ELF dynamic loader • transparent, only need to re-link application • straightforward support for shared libraries • function calls can be wrapped and redirected • use signal to trigger checkpoint • Migration method • File Migration • Write all the process state on checkpoint file • Destination node reads the checkpoint file • Socket Migration (Jinghua Wang, 2000) • Enable cross cluster migration • Application lost access to open files on initial cluster : • Remote Access Support is needed.

  6. Dynamite towards the Grid • Cluster Computing • Improve application performance using idle local resources • Improve system utilization • Cross Cluster Migration • More computing power from outside sources • Geographically Distributed Computing • Grid Computing Trends • Sharing of computational power across multiple organizations • Globus Toolkit - Major infrastructure component

  7. Global Access to Secondary Storage (GASS) • Designed for High Performance Applications • Not a full Distributed File System • Supports Default Data Movement • Read only access to an entire (constant) file • Unprotected, multiple write access: last thing written remains • Append only access with output required in real-time • Read and write access with no other concurrent accesses

  8. GASS Default data Movement Supported Write Write Append Read Append Read Not supported Write Write Read Read Write Read

  9. Design • Transparency • User need not modify the application • File system calls wrapped and redirected to GASS library • Minimizing Residual Dependency • Tasks leave no mirrors or proxies • GASS server on each file system • Possible scenarios • Initial to Remote • Remote to Remote • Remote to Initial

  10. Migration Application SIGUSR1 Application GASS Request File GASS send file Resume Reading Read Operation Read Operation GASS Write Cache GASS read file File Initial to Remote (read) Application GASS Server GASS Cache

  11. Migration Application SIGUSR1 Application GASS send file back Resume Writing Finish Writing Write Operation Write Operation GASS read Cache GASS write file File Initial to Remote (write) Application GASS Server GASS Cache

  12. SIGUSR1 Application Migration GASS Request file GASS Resume Reading Application GASS write cache Read Operation Read Operation GASS send file File GASS Read file Remote to Remote (read) Application GASS GASS Cache Cache GASS Server

  13. SIGUSR1 Application Migration GASS Send back cache Resume Writing Finish Writing Application GASS Read cache Read cache Write Operation Write Operation Send back cache File GASS Write file GASS Write file Remote to Remote (write) Application GASS GASS Cache Cache GASS Server

  14. Migration Application SIGUSR1 Application GASS Read operation (cache) Resume Read Operation File Remote to Initial (read) Application GASS Server GASS Cache

  15. Migration Application SIGUSR1 Application GASS GASS send file back Write operation (cache) Resume Write Operation GASS read Cache GASS write file File Remote to Initial (write) Application GASS Server GASS Cache

  16. Testing & Measurement • Performed between DAS-2 Clusters • Sequential test • checkpoint after application opens several files for writing/reading • checkpoint file is migrated manually to remote locations • shows good stability • Parallel test • Simple Master/Slave PVM applications which tasks perform file operations • Each tasks write to an output, or read from an input • Stable only for several initial migration

  17. Testing & Measurement • Simple performance measurement with sequential application, performing remote file access: • Memory to file operation • File to memory operation • File to file operation • Increasing size of file • from 1 Kb to 32 Mb • increased by doubling the size of the file • Performed with Fs4 (Utrecht) as initial cluster • Each measurement is repeated 10 times

  18. Memory to File

  19. File to Memory

  20. File to File

  21. Latency (sec) File <= 128 Kb File >= 4 Mb Latency & Bandwith Bandwith (MBps)

  22. Summary • Prerequisites of shared file system on cross cluster migration is eliminated • GASS library from Globus Toolkit can be used to support remote file access in Dynamite • Additional support can maintains transparency of Dynamite and minimizes the residual dependency • Sequential tests show good stability, there are still some limitations on parallel tests.

  23. Future Work • Other possible features from Globus Toolkit • Resource Discovery • Heart Beat Monitor (monitoring state of processes) • Additional Support for MPI • More widely used in the Grid community • No PVM job manager supported in the Globus Toolkit • Re-implementation with newer library • Checkpoint library of Dynamite still use glibc 2.0 • Difficult to re-link with application which uses later version

  24. Thank You For Your Attention

  25. Higher View Of How Dynamite Work Application Load Monitor Decompose Scheduler/Decider Capacity/Node Initial Placement Place Capacity/Node Run New Placement Migrate

  26. File to File

  27. File to File

  28. File to Memory

  29. File to Memory

  30. Memory to File

  31. Memory to File