A Performance Comparison of DSM, PVM, and MPI

A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

Introduction Relatively little is known about the performance of Distributed Shared Memory systems compared to Message Passing systems. We compare the performance of the TreadMarks DSM system with two popular message passing systems, MPICH-MPI, and PVM.

Introduction Three applications are compared, Mergesort, Mandelbrot Set Generation, and Backpropergation Neural Network. Each application represents a different class of problem.

TreadMarks DSM • Provides locks and barriers as primitives. • Uses Lazy Release Consistency. • Granularity of sharing is a page. • Creates page differentials to avoid the false sharing effect. • Version 1.0.3.3

Parallel Virtual Machine • Provides concept of a virtual parallel machine. • Exists as a daemon on each node. • Inter-process communication is mediated by the daemons. • Design for flexibility. • Version 3.4.3.

MPICH - MPI • Standard interface for developing Message Passing Applications. • Primary design goal is performance. • Primarily defines communications primitives. • MPICH is a reference platform for the MPI standard. • Version 1.2.4

System • 32 Node Linux Cluster • 800mhz Pentium with 256 MB • Redhat 7.2 • 100mbit Ethernet • Results determined for 1, 2, 4, 8, 16, 24, and 32 processes.

Mergesort • Parallelisation Strategy used is Divide and Conqueror. • Synchronisation between pairs of nodes. • Loosely Synchronous class problem. • Coarse grained synchronisation • Irregular synchronisation points. • Alternate phases of computation and communication.

Mergesort Results (1)

Mergesort Results (2)

Mandelbrot Set • Strategy used is Data Partitioning. • Work Pool is used as computation time of sections differs. • Work Pool size >= 2 * num processes. • Embarrassingly Parallel class problem. • May involve complex computation, but there is very little communication. • Give indication of performance Under ideal conditions.

Mandelbrot Set Results

Neural Network (1) • Strategy is Data Partitioning. • Each processor trains the network on a subsection of the data set. • Changes are summed and applied at the end of each epoch. • Requires large data sets to be effective. .

Neural Network (2) • Synchronous class problem. • Characterised by algorithm that carries out the same operation on all points in the data set. • Synchronisation occurs at regular points. • Often applies to problems that use data partitioning. • A large number of problems appear to belong to the synchronous class.

Neural Network Results (1)

Conclusion • In general the performance of DSM is poorer than that of MPICH or PVM. • Main reasons identified are: • The increased use of memory associated with the creation of page differentials. • False sharing affect due to the granularity of sharing. • Differential accumulation in the gather operation.

A Performance Comparison of DSM, PVM, and MPI

A Performance Comparison of DSM, PVM, and MPI

Presentation Transcript

Reporting and Evaluating DSM Performance

PVM and MPI What is more preferable?

PVM and MPI What Else is Needed For Cluster Computing?

MPI Program Performance

PVM

A Comparison of HTTP and HTTPS Performance

Open MPI - A High Performance Fault Tolerant MPI Library

Lecture 4: Distributed-memory Computing with PVM/MPI

PVM

Open MPI - A High Performance MPI-2 Library

A Performance Comparison of Contemporary DRAM Architectures

A Short Introduction to PVM and MPI

PVM

Performance Oriented MPI

PVM and MPI What Else is Needed For Cluster Computing?

MPI Program Performance

Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters

PVM – forerunner of MPI Development of the MPI standard MPI minimum Point-to-point communication

MPI Program Performance