1 / 19

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000. Three Programming Models . CC-SAS Linear address space for shared memory MP Communicate with other processes explicitly via message passing interface SHMEM Via get and put primitives. Platforms:. Tightly-coupled multiprocessors

cynara
Download Presentation

A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A comparison of CC-SAS, MP and SHMEM on SGI Origin2000

  2. Three Programming Models • CC-SAS • Linear address space for shared memory • MP • Communicate with other processes explicitly via message passing interface • SHMEM • Via get and put primitives

  3. Platforms: • Tightly-coupled multiprocessors • SGI Origin2000: a cache-coherent distributed shared memory machine • Less tightly-coupled clusters • A cluster of workstations connected by ethernet

  4. Purpose • Compare the three programming models on Origin2000, a modern 64-processor hardware cache-coherent machine • We focus on scientific applications that access data regularly or predictably.

  5. Questions to be answered • Can parallel algorithms be structured in the same way for good performance in all three models? • If there are substantial differences in performance under three models, where are the key bottlenecks? • Do we need to change the data structures or algorithms substantially to solve those bottlenecks?

  6. Applications and Algorithms • FFT • All-to-all communication(regular) • Ocean • Nearest-neighbor communication • Radix • All-to-all communication(irregular) • LU • One-to-many communication

  7. Performance Result

  8. question: • Why MP is much worse than CC-SAS and SHMEM?

  9. Analysis: Execution time = BUSY + LMEM + RMEM + SYNC where BUSY: CPU computation time LMEM: CPU stall time for local cache miss RMEM: CPU stall time for sending/receiving remote data SYNC: CPU time spend at synchronization events

  10. Where does the time go in MP?

  11. Improving MP performance • Remove extra data copy • Allocate all data involved in communication in shared address space • Reduce SYNC time • Use lock-free queue management instead in communication

  12. Speedups under Improved MP

  13. Why does CC-SAS perform best?

  14. Why does CC-SAS perform best? • Extra packing/unpacking operation in MP and SHMEM • Extra packet queue management in MP • …

  15. Speedups for Ocean

  16. Speedups for Radix

  17. Speedups for LU

  18. Conclusions • Good algorithm structures are portable among programming models. • MP is much worse than CC-SAS and SHMEM under hardware-coherent machine. However, we can achieve similar performance if extra data copy and queue synchronization are well solved. • Something about programmability

  19. Future work • How about those applications that indeed have irregular, unpredictable and naturally fine-grained data access and communication patterns? • How about software-based coherent machines (i.e. clusters)?

More Related