1 / 48

Profile Guided MPI Protocol Selection for Point-to-Point Communication Calls

Profile Guided MPI Protocol Selection for Point-to-Point Communication Calls. Aniruddha Marathe, David K. Lowenthal Department of Computer Science The University of Arizona Tucson, AZ {amarathe,dkl}@cs.arizona.edu. Zheng Gu, Matthew Small, Xin Yuan Department of Computer Science

makani
Download Presentation

Profile Guided MPI Protocol Selection for Point-to-Point Communication Calls

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profile Guided MPI Protocol Selection for Point-to-Point Communication Calls Aniruddha Marathe, David K. Lowenthal Department of Computer Science The University of Arizona Tucson, AZ {amarathe,dkl}@cs.arizona.edu Zheng Gu, Matthew Small, Xin Yuan Department of Computer Science Florida State University Tallahassee, FL {zgu,small,xyuan}@cs.fsu.edu 5/9/11 1

  2. Motivation • Need for anon-line protocol selection scheme: • Optimal protocol for a communication routine: application and architecture specific • Existing approaches • Off-line: Protocol selection at program compilation time • Static: One protocol per application • Difficult to adapt to program’s runtime characteristics 5/9/11 2

  3. Contributions • - On-line protocol selection algorithm • - Protocol cost model • Employed by the on-line protocol selection algorithm to estimate the total execution time per protocol • - Sender-initiated Post-copy protocol • A novel protocol to complement the existing set of protocols 5/9/11 3

  4. On-line Protocol Selection Algorithm • Selects the optimal communication protocol for a • communication phase dynamically • Protocol selection algorithm split into two phases: • Phase 1: Execution time estimation per protocol • Phase 2 (optimization): Buffer usage profiling • System works with four protocols 5/9/11 4

  5. On-line Protocol Selection Algorithm Rank 1 Rank 2 Rank 3 … Rank n Execution of phase 1 of a sample application: n tasks m MPI calls per task 5/9/11 5

  6. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase 5/9/11 6

  7. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase MPI Call 1 5/9/11 7

  8. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase MPI Call 1 tprotocol tprotocol tprotocol tprotocol MPI Call 2 5/9/11 8

  9. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase MPI Call 1 MPI Call 2 tprotocol tprotocol tprotocol tprotocol MPI Call m 5/9/11 9

  10. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase MPI Call 1 MPI Call 2 tprotocol tprotocol tprotocol tprotocol MPI Call m End of phase 5/9/11 10

  11. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase MPI Call 1 MPI Call 2 MPI Call m End of phase t t Protocol Selection Optimal Protocol = min(t) t 5/9/11 11

  12. On-line Protocol Selection Algorithm Phase 1 (Estimating Execution Times) Rank 1 Rank 2 Rank 3 … Rank n Start of phase MPI Call 1 MPI Call 2 MPI Call m End of phase t t Protocol Selection Optimal Protocol = min(t) t - Execution time linear in # MPI calls per phase 5/9/11 12

  13. Point-to-Point Protocols • - Our system uses the following protocols • Existing Protocols (Yuan et al. 2009): • Pre-copy • Sender-initiated Rendezvous • Receiver-initiated Rendezvous • New protocol • Post-copy - Protocols categorized based on: • Message size • Arrival patterns of the communicating tasks 5/9/11 13

  14. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver 5/9/11 14

  15. Pre-copy Protocol MPI Call Data Operation Sender Receiver Time MPI_Send 5/9/11 15

  16. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Local buffer copy 5/9/11 16

  17. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Local buffer copy RDMA Write Request 5/9/11 17

  18. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Local buffer copy RDMA Write Request MPI_Recv MPI_Barrier 5/9/11 18

  19. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Local buffer copy RDMA Write Request MPI_Recv MPI_Barrier Data RDMA Read 5/9/11 19

  20. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Local buffer copy RDMA Write Request MPI_Recv MPI_Barrier Data RDMA Read ACK RDMA Write 5/9/11 20

  21. Pre-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Local buffer copy RDMA Write Request MPI_Recv MPI_Barrier Data RDMA Read Sender Idle ACK RDMA Write MPI_Barrier 5/9/11 21

  22. Post-copy Protocol MPI Call Data Operation Time Sender Receiver 5/9/11 22

  23. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send 5/9/11 23

  24. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Request + Data RDMA Write 5/9/11 24

  25. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Request + Data RDMA Write MPI_Barrier 5/9/11 25

  26. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Request + Data RDMA Write MPI_Recv MPI_Barrier 5/9/11 26

  27. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Request + Data RDMA Write MPI_Recv MPI_Barrier Local buffer copy 5/9/11 27

  28. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Request + Data RDMA Write MPI_Recv MPI_Barrier Local buffer copy ACK 5/9/11 28

  29. Post-copy Protocol MPI Call Data Operation Time Sender Receiver MPI_Send Request + Data RDMA Write MPI_Recv MPI_Barrier Local buffer copy Sender Idle ACK MPI_Barrier - Sender spends significantly less idle time compared to Pre-copy 5/9/11 29

  30. Protocol Cost Model • - Supports five basic MPI operations: • MPI_Send • MPI_Recv • MPI_Isend • MPI_Irecv • MPI_Wait • Important terms: • tmemreg - Buffer registration time • tmemcopy - Buffer memory copy time • trdma_read - BufferRDMA Read time • trdma_write - Buffer RDMA Write time • tfunc_delay - Constant book-keeping time 5/9/11 30

  31. Post-copy Protocol Cost Model: Sender Early Sender Receiver MPI_Isend tmemreg trdma_write tfunc_delay MPI_Wait MPI_Irecv tfunc_delay tmemcopy tfunc_delay MPI_Wait tfunc_delay 5/9/11 31

  32. Post-copy Protocol Cost Model: Sender Early Sender Receiver MPI_Isend tmemreg trdma_write tfunc_delay MPI_Wait MPI_Irecv tfunc_delay tmemcopy tfunc_delay MPI_Wait tfunc_delay Sender = Total time tmemreg + trdma_write + 2 x tfunc_delay + 2 xtfunc_delay Receiver = Total time tmemcopy 5/9/11 32

  33. Post-copy Protocol Cost Model: Receiver Early Sender Receiver MPI_Irecv tfunc_delay MPI_Isend tmemreg MPI_Wait trdma_write twait_delay tfunc_delay MPI_Wait tmemcopy tfunc_delay tfunc_delay 5/9/11 33

  34. Post-copy Protocol Cost Model: Receiver Early Sender Receiver MPI_Irecv tfunc_delay MPI_Isend tmemreg MPI_Wait trdma_write twait_delay tfunc_delay MPI_Wait tmemcopy tfunc_delay tfunc_delay + 2 xtfunc_delay + trdma_write tmemreg Sender = Total time twait_delay+ tmemcopy+ 2 xtfunc_delay Receiver = Total time 5/9/11 34

  35. Optimization: Buffer Usage Profiling • - Example code snippet: ... MPI_Send(buff1, ...); MPI_Recv(buff2, ...); MPI_Send(buff3, ...); MPI_Recv(buff1, ...); ... 5/9/11 35

  36. Optimization: Buffer Usage Profiling Phase 2 (Buffer usage profiling) Rank 1 Rank 2 Rank 3 Rank n Start of phase 5/9/11 36

  37. Optimization: Buffer Usage Profiling Phase 2 (Buffer usage profiling) Rank 1 Rank 2 Rank 3 Rank n Start of phase MPI_Send(Buff 1) MPI_Recv(Buff 2) MPI_Send(Buff 3) 5/9/11 37

  38. Optimization: Buffer Usage Profiling Phase 2 (Buffer usage profiling) Rank 1 Rank 2 Rank 3 Rank n Start of phase MPI_Send(Buff 1) MPI_Recv(Buff 2) MPI_Send(Buff 3) MPI_Recv(Buff 1) 5/9/11 38

  39. Optimization: Buffer Usage Profiling Phase 2 (Buffer usage profiling) Rank 1 Rank 2 Rank 3 Rank n Start of phase MPI_Send(Buff 1) MPI_Recv(Buff 2) MPI_Send(Buff 3) MPI_Recv(Buff 1) 5/9/11 39

  40. Optimization: Buffer Usage Profiling • Conversion of synchronous calls to asynchronous calls ... MPI_Send(buff1, ...); MPI_Recv(buff2, ...); MPI_Send(buff3, ...); MPI_Recv(buff1, ...); ... 5/9/11 40

  41. Optimization: Buffer Usage Profiling • Conversion of synchronous calls to asynchronous calls ... MPI_Send(buff1, ...); MPI_Recv(buff2, ...); MPI_Send(buff3, ...); MPI_Recv(buff1, ...); ... Buffer Usage Profile ... MPI_Isend(buff1, ..., req1); MPI_Recv(buff2, ...); MPI_Send(buff3, ...); MPI_Wait(req1, ...); MPI_Recv(buff1, ...); ... 5/9/11 41

  42. Performance Evaluation • - Test Cluster: • Intel Xeon Processors (64 bit) • 8-core 2.33 GHz • 8 GB System Memory • 16 nodes • Infiniband Interconnect - Software: MVAPICH 2 - Benchmarks: • Sparse Matrix • CG • Sweep3D • Microbenchmarks 5/9/11 42

  43. Performance Evaluation • Single communication phase per application 5/9/11 43

  44. Performance Evaluation - System chose optimal protocol for each phase dynamically 5/9/11 44

  45. Performance Evaluation Real Modeled • Real and modeled execution times for Sparse Matrix Application • Modeling accuracy: 95% to 99% • Modeling overhead: less than 1% of total execution time 5/9/11 45

  46. Summary • Our system for on-line protocol selection was successfully tested for real and microbenchmarks. • - Protocol cost model: high accuracy with negligible overhead. • - Sender-initiated Post-copy protocol was successfully implemented. 5/9/11 46

  47. Questions? 5/9/11 47

  48. Thank You! 5/9/11 48

More Related