Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMP...
Download
1 / 45

Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs. Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr. Overview.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Nikolaos drosinos and nectarios koziris national technical university of athens

Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs

Nikolaos Drosinos and Nectarios Koziris

National Technical University of Athens

Computing Systems Laboratory

{ndros,nkoziris}@cslab.ece.ntua.gr

www.cslab.ece.ntua.gr


Nikolaos drosinos and nectarios koziris national technical university of athens

Overview Nested Loop Algorithms onto Clusters of SMPs

  • Introduction

  • Pure MPI Model

  • Hybrid MPI-OpenMP Models

    • Hyperplane Scheduling

    • Fine-grain Model

    • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction Nested Loop Algorithms onto Clusters of SMPs

    • Motivation:

      • SMP clusters

      • Hybrid programming models

        • Mostly fine-grain MPI-OpenMP paradigms

        • Mostly DOALL parallelization

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction Nested Loop Algorithms onto Clusters of SMPs

    • Contribution:

      • 3 programming models for the parallelization of nested loops algorithms

        • pure MPI

        • fine-grain hybrid MPI-OpenMP

        • coarse-grain hybrid MPI-OpenMP

      • Advanced hyperplane scheduling

        • minimize synchronization need

        • overlap computation with communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction Nested Loop Algorithms onto Clusters of SMPs

    Algorithmic Model:

    FOR j0 = min0 TO max0 DO

    FOR jn-1 = minn-1 TO maxn-1 DO

    Computation(j0,…,jn-1);

    ENDFOR

    ENDFOR

    • Perfectly nested loops

    • Constant flow data dependencies

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Introduction Nested Loop Algorithms onto Clusters of SMPs

    Target Architecture: SMP clusters

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview Nested Loop Algorithms onto Clusters of SMPs

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model Nested Loop Algorithms onto Clusters of SMPs

    • Tiling transformation groups iterations into atomic execution units (tiles)

    • Pipelined execution

    • Overlapping computation with communication

    • Makes no distinction between inter-node and intra-node communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model Nested Loop Algorithms onto Clusters of SMPs

    Example:

    FOR j1=0 TO 9 DO

    FOR j2=0 TO 7 DO

    A[j1,j2]:=A[j1-1,j2] + A[j1,j2-1];

    ENDFOR

    ENDFOR

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j Nested Loop Algorithms onto Clusters of SMPs2

    j1

    Pure MPI Model

    CPU1

    NODE1

    CPU0

    4 MPI nodes

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j Nested Loop Algorithms onto Clusters of SMPs2

    j1

    Pure MPI Model

    CPU1

    NODE1

    CPU0

    4 MPI nodes

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Pure MPI Model Nested Loop Algorithms onto Clusters of SMPs

    tile0 = nod0;

    tilen-2 = nodn-2;

    FOR tilen-1 = 0 TO DO

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    Compute(tile);

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    END FOR

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview Nested Loop Algorithms onto Clusters of SMPs

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Hyperplane Scheduling Nested Loop Algorithms onto Clusters of SMPs

    • Implements coarse-grain parallelism assuming inter-tile data dependencies

    • Tiles are organized into data-independent subsets (groups)

    • Tiles of the same group can be concurrently executed by multiple threads

    • Barrier synchronization between threads

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j Nested Loop Algorithms onto Clusters of SMPs2

    j1

    Hyperplane Scheduling

    CPU1

    2MPI nodes

    NODE1

    CPU0

    x

    2OpenMP threads

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    j Nested Loop Algorithms onto Clusters of SMPs2

    j1

    Hyperplane Scheduling

    CPU1

    2MPI nodes

    NODE1

    CPU0

    x

    2OpenMP threads

    CPU1

    NODE0

    CPU0

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Hyperplane Scheduling Nested Loop Algorithms onto Clusters of SMPs

    #pragma omp parallel

    {

    group0 = nod0;

    groupn-2 = nodn-2;

    tile0 = nod0 * m0 + th0;

    tilen-2 = nodn-2 * mn-2 + thn-2;

    FOR(groupn-1){

    tilen-1 = groupn-1 - ;

    if(0 <= tilen-1 <= )

    compute(tile);

    #pragma omp barrier

    }

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview Nested Loop Algorithms onto Clusters of SMPs

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Fine-grain Model Nested Loop Algorithms onto Clusters of SMPs

    • Incremental parallelization of computationally intensive parts

    • Relatively straightforward from pure MPI

    • Threads (re)spawned at computation

    • Inter-node communication outside of multi-threaded part

    • Thread synchronization through implicit barrier of omp parallel directive

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Fine-grain Model Nested Loop Algorithms onto Clusters of SMPs

    FOR(groupn-1){

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    #pragma omp parallel

    {

    thread_id=omp_get_thread_num();

    if(valid(tile,thread_id,groupn-1))

    Compute(tile);

    }

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview Nested Loop Algorithms onto Clusters of SMPs

    • Introduction

    • Pure MPI Model

    • Hybrid MPI-OpenMP Models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Coarse-grain Model Nested Loop Algorithms onto Clusters of SMPs

    • SPMD paradigm

    • Requires more programming effort

    • Threads are only spawned once

    • Inter-node communication inside multi-threaded part (requires MPI_THREAD_MULTIPLE)

    • Thread synchronization through explicit barrier (omp barrier directive)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Coarse-grain Model Nested Loop Algorithms onto Clusters of SMPs

    #pragma omp parallel

    {

    thread_id=omp_get_thread_num();

    FOR(groupn-1){

    #pragma omp master{

    Pack(snd_buf, tilen-1 – 1, nod);

    MPI_Isend(snd_buf, dest(nod));

    MPI_Irecv(recv_buf, src(nod));

    }

    if(valid(tile,thread_id,groupn-1))

    Compute(tile);

    #pragma omp master{

    MPI_Waitall;

    Unpack(recv_buf, tilen-1 + 1, nod);

    }

    #pragma omp barrier

    }

    }

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Summary: Fine-grain vs Coarse-grain Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview Nested Loop Algorithms onto Clusters of SMPs

    • Introduction

    • Pure MPI model

    • Hybrid MPI-OpenMP models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Experimental Results Nested Loop Algorithms onto Clusters of SMPs

    • 8-node SMP Linux Cluster (800 MHz PIII, 128 MB RAM, kernel 2.4.20)

    • MPICH v.1.2.5 (--with-device=ch_p4, --with-comm=shared)

    • Intel C++ compiler 7.0 (-O3

      -mcpu=pentiumpro -static)

    • FastEthernet interconnection

    • ADI micro-kernel benchmark (3D)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Alternating Direction Implicit (ADI) Nested Loop Algorithms onto Clusters of SMPs

    • Unitary data dependencies

    • 3D Iteration Space (X x Y x Z)

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 4 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 4 nodes Nested Loop Algorithms onto Clusters of SMPs

    • X < Y

    • X > Y

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=512 Z=8192 – 4 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 4 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 4 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    • X < Y

    • X > Y

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=256 Y=512 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=512 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=256 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=128 Y=512 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    Computation Communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    ADI X=512 Y=128 Z=8192 – 2 nodes Nested Loop Algorithms onto Clusters of SMPs

    Computation Communication

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Overview Nested Loop Algorithms onto Clusters of SMPs

    • Introduction

    • Pure MPI model

    • Hybrid MPI-OpenMP models

      • Hyperplane Scheduling

      • Fine-grain Model

      • Coarse-grain Model

  • Experimental Results

  • Conclusions – Future Work

  • EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Conclusions Nested Loop Algorithms onto Clusters of SMPs

    • Nested loop algorithms with arbitrary data dependencies can be adapted to the hybrid parallel programming paradigm

    • Hybrid models can be competitive to the pure MPI paradigm

    • Coarse-grain hybrid model can be more efficient than fine-grain one, but also more complicated

    • Programming efficiently in OpenMP not easier than programming efficiently in MPI

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Future Work Nested Loop Algorithms onto Clusters of SMPs

    • Application of methodology to real applications and benchmarks

    • Work balancing for coarse-grain model

    • Performance evaluation on advanced interconnection networks (SCI, Myrinet)

    • Generalization as compiler technique

    EuroPVM/MPI 2003


    Nikolaos drosinos and nectarios koziris national technical university of athens

    Questions? Nested Loop Algorithms onto Clusters of SMPs

    http://www.cslab.ece.ntua.gr/~ndros

    EuroPVM/MPI 2003


    ad
  • Login