Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMP...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs. Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,[email protected] www.cslab.ece.ntua.gr. Overview.

Download Presentation

Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Nikolaos drosinos and nectarios koziris national technical university of athens

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Nikolaos Drosinos and Nectarios Koziris

National Technical University of Athens

Computing Systems Laboratory

{ndros,[email protected]

www.cslab.ece.ntua.gr


Nikolaos drosinos and nectarios koziris national technical university of athens

Overview

  • Introduction

  • Pure MPI Model

  • Hybrid MPI-OpenMP Models

    • Hyperplane Scheduling

    • Fine-grain Model

    • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    • Motivation:

      • SMP clusters

      • Hybrid programming models

        • Mostly fine-grain MPI-OpenMP paradigms

        • Mostly DOALL parallelization

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    • Contribution:

      • 3 programming models for the parallelization of nested loops algorithms

        • pure MPI

        • fine-grain hybrid MPI-OpenMP

        • coarse-grain hybrid MPI-OpenMP

      • Advanced hyperplane scheduling

        • minimize synchronization need

        • overlap computation with communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    Algorithmic Model:

    FOR j0 = min0 TO max0 DO

    FOR jn-1 = minn-1 TO maxn-1 DO

    Computation(j0,…,jn-1);

    ENDFOR

    ENDFOR

    • Perfectly nested loops

    • Constant flow data dependencies

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    Target Architecture: SMP clusters

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model

    • Tiling transformation groups iterations into atomic execution units (tiles)

    • Pipelined execution

    • Overlapping computation with communication

    • Makes no distinction between inter-node and intra-node communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model

    Example:

    FOR j1=0 TO 9 DO

    FOR j2=0 TO 7 DO

    A[j1,j2]:=A[j1-1,j2] + A[j1,j2-1];

    ENDFOR

    ENDFOR

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Pure MPI Model

    CPU1

    NODE1

    CPU0

    4 MPI nodes

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Pure MPI Model

    CPU1

    NODE1

    CPU0

    4 MPI nodes

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model

    tile0 = nod0;

    tilen-2 = nodn-2;

    FOR tilen-1 = 0 TO DO

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    Compute(tile);

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    END FOR

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Hyperplane Scheduling

    • Implements coarse-grain parallelism assuming inter-tile data dependencies

    • Tiles are organized into data-independent subsets (groups)

    • Tiles of the same group can be concurrently executed by multiple threads

    • Barrier synchronization between threads

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Hyperplane Scheduling

    CPU1

    2MPI nodes

    NODE1

    CPU0

    x

    2OpenMP threads

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Hyperplane Scheduling

    CPU1

    2MPI nodes

    NODE1

    CPU0

    x

    2OpenMP threads

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Hyperplane Scheduling

    #pragma omp parallel

    {

    group0 = nod0;

    groupn-2 = nodn-2;

    tile0 = nod0 * m0 + th0;

    tilen-2 = nodn-2 * mn-2 + thn-2;

    FOR(groupn-1){

    tilen-1 = groupn-1 - ;

    if(0 <= tilen-1 <= )

    compute(tile);

    #pragma omp barrier

    }

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Fine-grain Model

    • Incremental parallelization of computationally intensive parts

    • Relatively straightforward from pure MPI

    • Threads (re)spawned at computation

    • Inter-node communication outside of multi-threaded part

    • Thread synchronization through implicit barrier of omp parallel directive

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Fine-grain Model

    FOR(groupn-1){

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    #pragma omp parallel

    {

    thread_id=omp_get_thread_num();

    if(valid(tile,thread_id,groupn-1))

    Compute(tile);

    }

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Coarse-grain Model

    • SPMD paradigm

    • Requires more programming effort

    • Threads are only spawned once

    • Inter-node communication inside multi-threaded part (requires MPI_THREAD_MULTIPLE)

    • Thread synchronization through explicit barrier (omp barrier directive)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Coarse-grain Model

    #pragma omp parallel

    {

    thread_id=omp_get_thread_num();

    FOR(groupn-1){

    #pragma omp master{

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    }

    if(valid(tile,thread_id,groupn-1))

    Compute(tile);

    #pragma omp master{

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    }

    #pragma omp barrier

    }

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Summary: Fine-grain vs Coarse-grain

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI model

    • Hybrid MPI-OpenMP models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Experimental Results

    • 8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20)

    • MPICH v.1.2.5 (--with-device=ch_p4, --with-comm=shared)

    • Intel C++ compiler 7.0 (-O3

      -mcpu=pentiumpro -static)

    • FastEthernet interconnection

    • ADI micro-kernel benchmark (3D)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Alternating Direction Implicit (ADI)

    • Unitary data dependencies

    • 3D Iteration Space (X x Y x Z)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 4 nodes

    • X < Y

    • X > Y

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=512 Z=8192 – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 2 nodes

    • X < Y

    • X > Y

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=256 Y=512 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=512 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=256 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 2 nodes

    Computation Communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 2 nodes

    Computation Communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI model

    • Hybrid MPI-OpenMP models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Conclusions

    • Nested loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm

    • Hybrid models can be competitive to the pure MPI paradigm

    • Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated

    • Programming efficiently in OpenMP not easier than programming efficiently in MPI

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Future Work

    • Application of methodology to real applications and benchmarks

    • Work balancing for coarse-grain model

    • Performance evaluation on advanced interconnection networks (SCI, Myrinet)

    • Generalization as compiler technique

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Questions?

    http://www.cslab.ece.ntua.gr/~ndros

    EuroPVM/MPI 2003


  • Login