Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens - PowerPoint PPT Presentation

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMP...
Download
1 / 45

  • 105 Views
  • Uploaded on
  • Presentation posted in: General

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs. Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr. Overview.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Nikolaos drosinos and nectarios koziris national technical university of athens

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Nikolaos Drosinos and Nectarios Koziris

National Technical University of Athens

Computing Systems Laboratory

{ndros,nkoziris}@cslab.ece.ntua.gr

www.cslab.ece.ntua.gr


Nikolaos drosinos and nectarios koziris national technical university of athens

Overview

  • Introduction

  • Pure MPI Model

  • Hybrid MPI-OpenMP Models

    • Hyperplane Scheduling

    • Fine-grain Model

    • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    • Motivation:

      • SMP clusters

      • Hybrid programming models

        • Mostly fine-grain MPI-OpenMP paradigms

        • Mostly DOALL parallelization

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    • Contribution:

      • 3 programming models for the parallelization of nested loops algorithms

        • pure MPI

        • fine-grain hybrid MPI-OpenMP

        • coarse-grain hybrid MPI-OpenMP

      • Advanced hyperplane scheduling

        • minimize synchronization need

        • overlap computation with communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    Algorithmic Model:

    FOR j0 = min0 TO max0 DO

    FOR jn-1 = minn-1 TO maxn-1 DO

    Computation(j0,…,jn-1);

    ENDFOR

    ENDFOR

    • Perfectly nested loops

    • Constant flow data dependencies

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction

    Target Architecture: SMP clusters

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model

    • Tiling transformation groups iterations into atomic execution units (tiles)

    • Pipelined execution

    • Overlapping computation with communication

    • Makes no distinction between inter-node and intra-node communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model

    Example:

    FOR j1=0 TO 9 DO

    FOR j2=0 TO 7 DO

    A[j1,j2]:=A[j1-1,j2] + A[j1,j2-1];

    ENDFOR

    ENDFOR

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Pure MPI Model

    CPU1

    NODE1

    CPU0

    4 MPI nodes

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Pure MPI Model

    CPU1

    NODE1

    CPU0

    4 MPI nodes

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model

    tile0 = nod0;

    tilen-2 = nodn-2;

    FOR tilen-1 = 0 TO DO

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    Compute(tile);

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    END FOR

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Hyperplane Scheduling

    • Implements coarse-grain parallelism assuming inter-tile data dependencies

    • Tiles are organized into data-independent subsets (groups)

    • Tiles of the same group can be concurrently executed by multiple threads

    • Barrier synchronization between threads

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Hyperplane Scheduling

    CPU1

    2MPI nodes

    NODE1

    CPU0

    x

    2OpenMP threads

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j2

    j1

    Hyperplane Scheduling

    CPU1

    2MPI nodes

    NODE1

    CPU0

    x

    2OpenMP threads

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Hyperplane Scheduling

    #pragma omp parallel

    {

    group0 = nod0;

    groupn-2 = nodn-2;

    tile0 = nod0 * m0 + th0;

    tilen-2 = nodn-2 * mn-2 + thn-2;

    FOR(groupn-1){

    tilen-1 = groupn-1 - ;

    if(0 <= tilen-1 <= )

    compute(tile);

    #pragma omp barrier

    }

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Fine-grain Model

    • Incremental parallelization of computationally intensive parts

    • Relatively straightforward from pure MPI

    • Threads (re)spawned at computation

    • Inter-node communication outside of multi-threaded part

    • Thread synchronization through implicit barrier of omp parallel directive

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Fine-grain Model

    FOR(groupn-1){

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    #pragma omp parallel

    {

    thread_id=omp_get_thread_num();

    if(valid(tile,thread_id,groupn-1))

    Compute(tile);

    }

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Coarse-grain Model

    • SPMD paradigm

    • Requires more programming effort

    • Threads are only spawned once

    • Inter-node communication inside multi-threaded part (requires MPI_THREAD_MULTIPLE)

    • Thread synchronization through explicit barrier (omp barrier directive)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Coarse-grain Model

    #pragma omp parallel

    {

    thread_id=omp_get_thread_num();

    FOR(groupn-1){

    #pragma omp master{

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    }

    if(valid(tile,thread_id,groupn-1))

    Compute(tile);

    #pragma omp master{

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    }

    #pragma omp barrier

    }

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Summary: Fine-grain vs Coarse-grain

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI model

    • Hybrid MPI-OpenMP models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Experimental Results

    • 8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20)

    • MPICH v.1.2.5 (--with-device=ch_p4, --with-comm=shared)

    • Intel C++ compiler 7.0 (-O3

      -mcpu=pentiumpro -static)

    • FastEthernet interconnection

    • ADI micro-kernel benchmark (3D)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Alternating Direction Implicit (ADI)

    • Unitary data dependencies

    • 3D Iteration Space (X x Y x Z)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 4 nodes

    • X < Y

    • X > Y

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=512 Z=8192 – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 4 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 2 nodes

    • X < Y

    • X > Y

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=256 Y=512 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=512 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=256 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 2 nodes

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 2 nodes

    Computation Communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 2 nodes

    Computation Communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview

    • Introduction

    • Pure MPI model

    • Hybrid MPI-OpenMP models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Conclusions

    • Nested loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm

    • Hybrid models can be competitive to the pure MPI paradigm

    • Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated

    • Programming efficiently in OpenMP not easier than programming efficiently in MPI

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Future Work

    • Application of methodology to real applications and benchmarks

    • Work balancing for coarse-grain model

    • Performance evaluation on advanced interconnection networks (SCI, Myrinet)

    • Generalization as compiler technique

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Questions?

    http://www.cslab.ece.ntua.gr/~ndros

    EuroPVM/MPI 2003


  • Login