90 likes | 168 Views
Explore the impact of 1 process per node versus 2 processes per node on parallelism in parsec circuit scaling. Analyze performance metrics like MPI calls and waitany prevalence. Timings and insights provided.
E N D
Parallel Scaling of parsparsecircuit3.c Tim Warburton
1 process per node • In these tests we only use one out of two processors per node.
blackbear: 16 processors, 16 nodes Apart from the mpi_allreduce calls, this is an almost perfect picture of parallelism
2 Processes Per Node • We use both processors on each node
blackbear 8 nodes, 16 processes Notice, the prevelance of waitany. Clearly this code is not working as well as itdoes when running with 1 process per node.
blackbear 8 nodes, 16 processes(zoom in) I suspect that the threaded mpi communicators for the unblockedisend and irecv are competing for cpu time with the user code. Also – there could be competition for the memory bus and the network busbetween the processors.
Timings for Two Processes Per Nodes on Los Lobos Timings courtesy of Zhaoxian Zhou