190 likes | 318 Views
A comparison of CC-SAS, MP and SHMEM on SGI Origin2000. Three Programming Models . CC-SAS Linear address space for shared memory MP Communicate with other processes explicitly via message passing interface SHMEM Via get and put primitives. Platforms:. Tightly-coupled multiprocessors
E N D
Three Programming Models • CC-SAS • Linear address space for shared memory • MP • Communicate with other processes explicitly via message passing interface • SHMEM • Via get and put primitives
Platforms: • Tightly-coupled multiprocessors • SGI Origin2000: a cache-coherent distributed shared memory machine • Less tightly-coupled clusters • A cluster of workstations connected by ethernet
Purpose • Compare the three programming models on Origin2000, a modern 64-processor hardware cache-coherent machine • We focus on scientific applications that access data regularly or predictably.
Questions to be answered • Can parallel algorithms be structured in the same way for good performance in all three models? • If there are substantial differences in performance under three models, where are the key bottlenecks? • Do we need to change the data structures or algorithms substantially to solve those bottlenecks?
Applications and Algorithms • FFT • All-to-all communication(regular) • Ocean • Nearest-neighbor communication • Radix • All-to-all communication(irregular) • LU • One-to-many communication
question: • Why MP is much worse than CC-SAS and SHMEM?
Analysis: Execution time = BUSY + LMEM + RMEM + SYNC where BUSY: CPU computation time LMEM: CPU stall time for local cache miss RMEM: CPU stall time for sending/receiving remote data SYNC: CPU time spend at synchronization events
Improving MP performance • Remove extra data copy • Allocate all data involved in communication in shared address space • Reduce SYNC time • Use lock-free queue management instead in communication
Why does CC-SAS perform best? • Extra packing/unpacking operation in MP and SHMEM • Extra packet queue management in MP • …
Conclusions • Good algorithm structures are portable among programming models. • MP is much worse than CC-SAS and SHMEM under hardware-coherent machine. However, we can achieve similar performance if extra data copy and queue synchronization are well solved. • Something about programmability
Future work • How about those applications that indeed have irregular, unpredictable and naturally fine-grained data access and communication patterns? • How about software-based coherent machines (i.e. clusters)?