Grid superscalar a programming model for the grid
Download
1 / 67

GRID superscalar: a programming model for the Grid - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

GRID superscalar: a programming model for the Grid. Doctoral Thesis Computer Architecture Department Technical University of Catalonia. Raül Sirvent Pardell Advisor: Rosa M. Badia Sala. Outline. Introduction Programming interface Runtime Fault tolerance at the programming model level

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' GRID superscalar: a programming model for the Grid' - yeo-walsh


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Grid superscalar a programming model for the grid

GRID superscalar: a programming model for the Grid

Doctoral ThesisComputer Architecture DepartmentTechnical University of Catalonia

Raül Sirvent Pardell

Advisor: Rosa M. Badia Sala


Outline
Outline

  • Introduction

  • Programming interface

  • Runtime

  • Fault tolerance at the programming model level

  • Conclusions and future work

GRID superscalar: a programming model for the Grid


Outline1
Outline

  • Introduction

    1.1 Motivation

    1.2 Related work

    1.3 Thesis objectives and contributions

  • Programming interface

  • Runtime

  • Fault tolerance at the programming model level

  • Conclusions and future work

GRID superscalar: a programming model for the Grid


1 1 motivation
1.1 Motivation

  • The Grid architecture layers

Applications

Grid Middleware

(Job management, Data transfer,

Security, Information, QoS, ...)

Distributed Resources

GRID superscalar: a programming model for the Grid


1 1 motivation1
1.1 Motivation

  • What middleware should I use?

GRID superscalar: a programming model for the Grid


1 1 motivation2

GRID

1.1 Motivation

  • Programming tools: are they easy?

Grid UNAWARE

Grid AWARE

VS.

GRID superscalar: a programming model for the Grid


1 1 motivation3
1.1 Motivation

  • Can I run my programs in parallel?

Explicit parallelism

Implicit parallelism

VS.

for(i=0; i < MSIZE; i++)

for(j=0; j < MSIZE; j++)

for(k=0; k < MSIZE; k++)

matmul(A(i,k), B(k,j), C(i,j))

fork

Draw it by hand means explicit

join

GRID superscalar: a programming model for the Grid


1 1 motivation4
1.1 Motivation

  • The Grid: a massive, dynamic and heterogeneous environment prone to failures

    • Study different techniques to detect and overcome failures

  • Checkpoint

  • Retries

  • Replication

GRID superscalar: a programming model for the Grid


1 2 related work
1.2 Related work

GRID superscalar: a programming model for the Grid


1 3 thesis objectives and contributions
1.3 Thesis objectives and contributions

  • Objective: create a programming model for the Grid

    • Grid unaware

    • Implicit parallelism

    • Sequential programming

    • Allows to use well-known imperative languages

    • Speed up applications

    • Include fault detection and recovery

GRID superscalar: a programming model for the Grid


1 3 thesis objectives and contributions1
1.3 Thesis objectives and contributions

  • Contribution: GRID superscalar

    • Programming interface

    • Runtime environment

    • Fault tolerance features

GRID superscalar: a programming model for the Grid


Outline2
Outline

  • Introduction

  • Programming interface

    2.1 Design

    2.2 User interface

    2.3 Programming comparison

  • Runtime

  • Fault tolerance at the programming model level

  • Conclusions and future work

GRID superscalar: a programming model for the Grid


2 1 design
2.1 Design

  • Interface objectives

    • Grid unaware

    • Implicit parallelism

    • Sequential programming

    • Allows to use well-known imperative languages

GRID superscalar: a programming model for the Grid


2 1 design1
2.1 Design

  • Target applications

    • Algorithms which may be easily splitted in tasks

      • Branch and bound computations, divide and conquer algorithms, recursive algorithms, …

    • Coarse grained tasks

    • Independent tasks

      • Scientific workflows, optimization algorithms, parameter sweep

    • Main parameters: FILES

      • External simulators, finite element solvers, BLAST, GAMESS

GRID superscalar: a programming model for the Grid


2 1 design2
2.1 Design

  • Application’s architecture: a master-worker paradigm

    • Master-worker parallel paradigm fits with our objectives

    • Main program: the master

    • Functions: workers

      • Function = Generic representation of a task

    • Glue to transform a sequential application into a master-worker application: stubs – skeletons (RMI, RPC, …)

      • Stub: call to runtime interface

      • Skeleton: binary which calls to the user function

GRID superscalar: a programming model for the Grid


2 1 design3

app.c

app-functions.c

2.1 Design

void matmul(char *f1, char *f2, char *f3)

{

getBlocks(f1, f2, f3, A, B, C);

for (i = 0; i < A->rows; i++) {

for (j = 0; j < B->cols; j++) {

for (k = 0; k < A->cols; k++) {

C->data[i][j] += A->data[i][k] * B->data[k][j];

putBlocks(f1, f2, f3, A, B, C);

}

for(i=0; i < MSIZE; i++)

for(j=0; j < MSIZE; j++)

for(k=0; k < MSIZE; k++)

matmul(A(i,k), B(k,j), C(i,j))

Local scenario

GRID superscalar: a programming model for the Grid


2 1 design4
2.1 Design

app.c

app-functions.c

app-functions.c

app-functions.c

app-functions.c

app-functions.c

app-functions.c

Middleware

Master-Worker paradigm

GRID superscalar: a programming model for the Grid


2 1 design5
2.1 Design

  • Intermediate language concept: assembler code

  • In GRIDSs

  • The Execute generic interface

    • Instruction set is defined by the user

    • Single entry point to the runtime

    • Allows easy building of programming language bindings (Java, Perl, Shell Script)

      • Easier technology adoption

C, C++, …

Assembler

Processor execution

C, C++, …

Workflow

Grid execution

GRID superscalar: a programming model for the Grid


2 2 user interface
2.2 User interface

  • Steps to program an application

    • Task definition

      • Identify those functions/programs in the application that are going to be executed in the computational Grid

      • All parameters must be passed in the header (remote execution)

    • Interface Definition Language (IDL)

      • For every task defined, identify which parameters are input/output files and which are input/output scalars

    • Programming API: master and worker

      • Write the main program and the tasks using GRIDSs API

GRID superscalar: a programming model for the Grid


2 2 user interface1
2.2 User interface

  • Interface Definition Language (IDL) file

    • CORBA-IDL like interface:

      • in/out/inout files

      • in/out/inout scalar values

    • The functions listed in this file will be executed in the Grid

interface MATMUL {

void matmul(in File f1, in File f2, inout File f3);

};

GRID superscalar: a programming model for the Grid


2 2 user interface2
2.2 User interface

  • Programming API: master and worker

app.c

app-functions.c

  • Master side

    GS_On

    GS_Off

    GS_FOpen/GS_FClose

    GS_Open/GS_Close

    GS_Barrier

    GS_Speculative_End

  • Worker side

    GS_System

    gs_result

    GS_Throw

GRID superscalar: a programming model for the Grid


2 2 user interface3
2.2 User interface

  • Task’s constraints and cost specification

    • Constraints: allow to specify the needs of a task (CPU, memory, architecture, software, …)

      • Build an expression in a constraint function (evaluated for every machine)

    • Cost: estimated execution time of a task (in seconds)

      • Useful for scheduling

      • Calculate it in a cost function

      • GS_GFlops / GS_Filesize may be used

      • An external estimator can be also called

other.Mem == 1024

cost = operations / GS_GFlops();

GRID superscalar: a programming model for the Grid


2 3 programming comparison
2.3 Programming comparison

  • Globus vs GRIDSs

Grid-aware

int main()

{

rsl = "&(executable=/home/user/sim)(arguments=input1.txt output1.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input1.txt home/user/input1.txt))(file_stage_out=/home/user/output1.txt gsiftp://bscgrid01.bsc.es/path/output1.txt)(file_clean_up=/home/user/input1.txt /home/user/output1.txt)";

globus_gram_client_job_request(bscgrid02.bsc.es, rsl, NULL, NULL);

rsl = "&(executable=/home/user/sim)(arguments=input2.txt output2.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input2.txt /home/user/input2.txt))(file_stage_out=/home/user/output2.txt gsiftp://bscgrid01.bsc.es/path/output2.txt)(file_clean_up=/home/user/input2.txt /home/user/output2.txt)";

globus_gram_client_job_request(bscgrid03.bsc.es, rsl, NULL, NULL);

rsl = "&(executable=/home/user/sim)(arguments=input3.txt output3.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input3.txt /home/user/input3.txt))(file_stage_out=/home/user/output3.txt gsiftp://bscgrid01.bsc.es/path/output3.txt)(file_clean_up=/home/user/input3.txt /home/user/output3.txt)";

globus_gram_client_job_request(bscgrid04.bsc.es, rsl, NULL, NULL);

}

Explicit parallelism

GRID superscalar: a programming model for the Grid


2 3 programming comparison1
2.3 Programming comparison

  • Globus vs GRIDSs

void sim(File input, File output)

{

command = "/home/user/sim " + input + ' ' + output;

gs_result = GS_System(command);

}

int main()

{

GS_On();

sim("/path/input1.txt", "/path/output1.txt");

sim("/path/input2.txt", "/path/output2.txt");

sim("/path/input3.txt", "/path/output3.txt");

GS_Off(0);

}

GRID superscalar: a programming model for the Grid


2 3 programming comparison2
2.3 Programming comparison

  • DAGMan vs GRIDSs

A

B

C

D

Explicit parallelism

int main()

{

GS_On();

task_A(f1, f2, f3);

task_B(f2, f4);

task_C(f3, f5);

task_D(f4, f5, f6);

GS_Off(0);

}

No if/while clauses

JOB A A.condor

JOB B B.condor

JOB C C.condor

JOB D D.condor

PARENT A CHILD B C

PARENT B C CHILD D

GRID superscalar: a programming model for the Grid


2 3 programming comparison3
2.3 Programming comparison

  • Ninf-G vs GRIDSs

Grid-aware

int main()

{

grpc_initialize("config_file");

grpc_object_handle_init_np("A", &A_h, "class");

grpc_object_handle_init_np("B", &B_h," class");

for(i = 0; i < 25; i++)

{

grpc_invoke_async_np(A_h,"foo",&sid,f_in[2*i],f_out[2*i]);

grpc_invoke_async_np(B_h,"foo",&sid,f_in[2*i+1],f_out[2*i+1]);

grpc_wait_all();

}

grpc_object_handle_destruct_np(&A_h);

grpc_object_handle_destruct_np(&B_h);

grpc_finalize();

}

Explicit parallelism

int main()

{

GS_On();

for(i = 0; i < 50; i++)

foo(f_in[i], f_out[i]);

GS_Off(0);

}

GRID superscalar: a programming model for the Grid


2 3 programming comparison4
2.3 Programming comparison

  • VDL vs GRIDSs

No if/while clauses

DV trans1( [email protected]{output:tmp.0}, [email protected]{input:filein.0} );

DV trans2( [email protected]{output:fileout.0}, [email protected]{input:tmp.0} );

DV trans1( [email protected]{output:tmp.1}, [email protected]{input:filein.1} );

DV trans2( [email protected]{output:fileout.1}, [email protected]{input:tmp.1} );

...

DV trans1( [email protected]{output:tmp.999}, [email protected]{input:filein.999} );

DV trans2( [email protected]{output:fileout.999}, [email protected]{input:tmp.999} );

int main()

{

GS_On();

for(i = 0; i < 1000; i++)

{

tmp = "tmp." + i; filein = "filein." + i;

fileout = "fileout." + i;

trans1(tmp, filein);

trans2(fileout, tmp);

}

GS_Off(0);

}

GRID superscalar: a programming model for the Grid


Outline3
Outline

  • Introduction

  • Programming interface

  • Runtime

    3.1 Scientific contributions

    3.2 Developments

    3.3 Evaluation tests

  • Fault tolerance at the programming model level

  • Conclusions and future work

GRID superscalar: a programming model for the Grid


3 1 scientific contributions
3.1 Scientific contributions

  • Runtime objectives

    • Extract implicit parallelism in sequential applications

    • Speed up execution using the Grid

  • Main requirement: Grid middleware

    • Job management

    • Data transfer

    • Security

GRID superscalar: a programming model for the Grid


3 1 scientific contributions1

ISU

ISU

FPU

FPU

FXU

FXU

IDU

IDU

LSU

LSU

IFU

IFU

BXU

BXU

L3 Directory/Control

L2

L2

L2

Grid

3.1 Scientific contributions

  • Apply computer architecture knowledge to the Grid (superscalar processor)

ns  seconds/minutes/hours

GRID superscalar: a programming model for the Grid


3 1 scientific contributions2
3.1 Scientific contributions

  • Data dependence analysis: allow parallelism

task1(..., f1)

Read after Write

task2(f1, ...)

task1(f1, ...)

Write after Read

task2(..., f1)

task1(..., f1)

Write after Write

task2(..., f1)

GRID superscalar: a programming model for the Grid


3 1 scientific contributions3
3.1 Scientific contributions

for(i=0; i < MSIZE; i++)

for(j=0; j < MSIZE; j++)

for(k=0; k < MSIZE; k++)

matmul(A(i,k), B(k,j), C(i,j))

matmul(A(0,0), B(0,0), C(0,0))

k = 0

i = 0

j = 0

k = 1

matmul(A(0,1), B(1,0), C(0,0))

matmul(A(0,2), B(2,0), C(0,0))

k = 2

k = 0

matmul(A(0,0), B(0,0), C(0,1))

...

i = 0

j = 1

k = 1

matmul(A(0,1), B(1,0), C(0,1))

k = 2

matmul(A(0,2), B(2,0), C(0,1))

GRID superscalar: a programming model for the Grid


3 1 scientific contributions4
3.1 Scientific contributions

for(i=0; i < MSIZE; i++)

for(j=0; j < MSIZE; j++)

for(k=0; k < MSIZE; k++)

matmul(A(i,k), B(k,j), C(i,j))

i = 0

j = 2

i = 1

j = 0

i = 1

j = 1

i = 1

j = 2

matmul(A(0,0), B(0,0), C(0,0))

k = 0

i = 0

j = 0

k = 1

matmul(A(0,1), B(1,0), C(0,0))

matmul(A(0,2), B(2,0), C(0,0))

k = 2

...

...

k = 0

matmul(A(0,0), B(0,0), C(0,1))

i = 0

j = 1

k = 1

matmul(A(0,1), B(1,0), C(0,1))

k = 2

matmul(A(0,2), B(2,0), C(0,1))

GRID superscalar: a programming model for the Grid


3 1 scientific contributions5
3.1 Scientific contributions

  • File renaming: increase parallelism

task1(..., f1)

Read after Write

Unavoidable

task2(f1, ...)

task1(f1, ...)

Write after Read

Avoidable

task2(..., f1_NEW)

task2(..., f1)

task1(..., f1)

Avoidable

Write after Write

task2(..., f1)

task2(..., f1_NEW)

GRID superscalar: a programming model for the Grid


3 2 developments
3.2 Developments

  • Basic functionality

    • Job submission (middleware usage)

      • Select sources for input files

      • Submit, monitor or cancel jobs

      • Results collection

    • API implementation

      • GS_On: read configuration file and environment

      • GS_Off: wait for tasks, cleanup remote data, undo renaming

      • GS_(F)Open: create a local task

      • GS_(F)Close: notify end of local task

      • GS_Barrier: wait for all running tasks to finish

      • GS_System: translate path

      • GS_Speculative_End: barrier until throw. If throw, discard tasks from throw to GS_Speculative_End

      • GS_Throw: use gs_result to notify it

GRID superscalar: a programming model for the Grid


3 2 developments1
3.2 Developments

...

Middleware

Task scheduling: Direct Acyclic Graph

GRID superscalar: a programming model for the Grid


3 2 developments2
3.2 Developments

  • Task scheduling: resource brokering

    • A resource broker is needed (but not an objective)

    • Grid configuration file

      • Information about hosts (hostname, limit of jobs, queue, working directory, quota, …)

      • Initial set of machines (can be changed dynamically)

<?xml version="1.0" encoding="UTF-8"?>

<project isSimple="yes" masterBandwidth="100000" masterBuildScript="" masterInstallDir="/home/rsirvent/matmul-master" masterName="bscgrid01.bsc.es" masterSourceDir="/datos/GRID-S/GT4/doc/examples/matmul" name="matmul" workerBuildScript="" workerSourceDir="/datos/GRID-S/GT4/doc/examples/matmul">

...

<workers>

<worker Arch="x86" GFlops="5.985" LimitOfJobs="2" Mem="1024" NCPUs="2" NetKbps="100000" OpSys="Linux" Queue="none" Quota="0" deploymentStatus="deployed" installDir="/home/rsirvent/matmul-worker" name="bscgrid01.bsc.es">

GRID superscalar: a programming model for the Grid


3 2 developments3
3.2 Developments

  • Task scheduling: resource brokering

    • Scheduling policy

      • Estimation of total execution time of a single task

      • FileTransferTime: time to transfer needed files to a resource (calculated with the hosts information and the location of files)

        • Select fastest source for a file

      • ExecutionTime: estimation of the task’s run time in a resource. An interface function (can be calculated, or estimated by an external entity)

        • Select fastest resource for execution

      • Smallest estimation is selected

GRID superscalar: a programming model for the Grid


3 2 developments4
3.2 Developments

  • Task scheduling: resource brokering

    • Match task constraints and machine capabilities

    • Implemented using the ClassAd library

      • Machine: offers capabilities (from Grid configuration file: memory, architecture, …)

      • Task: demands capabilities

    • Filter candidate machines for a particular task

SoftwareList = BLAST, GAMESS

Software = BLAST

SoftwareList = GAMESS

GRID superscalar: a programming model for the Grid


3 2 developments5

f1

f2

3.2 Developments

f3

f3

Middleware

Task scheduling: File locality

GRID superscalar: a programming model for the Grid


3 2 developments6
3.2 Developments

  • Other file locality exploitation mechanisms

    • Shared input disks

      • NFS or replicated data

    • Shared working directories

      • NFS

    • Erasing unused versions of files (decrease disk usage)

    • Disk quota control (locality increases disk usage and quota may be lower than expected)

GRID superscalar: a programming model for the Grid


3 3 evaluation
3.3 Evaluation

GRID superscalar: a programming model for the Grid


3 3 evaluation1

Launch

Launch

Launch

MF

MF

MF

MF

BT

BT

BT

SP

SP

SP

LU

LU

LU

MF

MF

MF

MF

MF

MF

BT

BT

BT

SP

SP

SP

LU

LU

LU

MF

MF

Report

Report

Report

MF

MF

MF

MF

BT

BT

BT

SP

SP

SP

LU

LU

LU

3.3 Evaluation

  • NAS Grid Benchmarks

HC

ED

MB

VP

GRID superscalar: a programming model for the Grid


3 3 evaluation2
3.3 Evaluation

  • Run with classes S, W, A (2 machines x 4 CPUs)

  • VP benchmark must be analyzed in detail (does not scale up to 3 CPUs)

GRID superscalar: a programming model for the Grid


3 3 evaluation3
3.3 Evaluation

  • Performance analysis

    • GRID superscalar runtime instrumented

    • Paraver tracefiles from the client side

    • The lifecycle of all tasks has been studied in detail

  • Overhead of GRAM Job Manager polling interval

GRID superscalar: a programming model for the Grid


3 3 evaluation4
3.3 Evaluation

  • VP.S task assignment

    • 14.7% of the transfers when exploiting locality

    • VP is parallel, but its last part is sequentially executed

BT

MF

MG

MF

FT

BT

MF

MG

MF

FT

BT

MF

MG

MF

FT

Kadesh8

Khafre

Remote file transfers

GRID superscalar: a programming model for the Grid


3 3 evaluation5
3.3 Evaluation

  • Conclusion: workflow and granularity are important to achieve speed up

GRID superscalar: a programming model for the Grid


3 3 evaluation6
3.3 Evaluation

Two-dimensional potential energy hypersurface for acetone as a function of the 1, and 2 angles

GRID superscalar: a programming model for the Grid


3 3 evaluation7
3.3 Evaluation

  • Number of executed tasks: 1120

  • Each task between 45 and 65 minutes

  • Speed up: 26.88 (32 CPUs), 49.17 (64 CPUs)

  • Long running test, heterogeneous and transatlantic Grid

14 CPUs

22 CPUs

28 CPUs

GRID superscalar: a programming model for the Grid


3.3 Evaluation

  • 15 million protein sequences have been compared using BLAST and GRID superscalar

Genomes

15 million

Proteins

15 million

Proteins

GRID superscalar: a programming model for the Grid


3 3 evaluation8
3.3 Evaluation

  • 100,000 tasks in 4000 CPUs (= 1,000 exclusive nodes)

  • “Grid” of 1,000 machines with very low latency between them

    • Stress test for the runtime

  • Avoids user to work with queuing system

  • Saves queuing system from handling a huge set of independent tasks

GRID superscalar: a programming model for the Grid


Grid superscalar programming interface and runtime
GRID superscalar: programming interface and runtime

  • Publications

    Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta, "Automatic Grid workflow based on imperative programming languages", Concurrency and Computation: Practice and Experience, John Wiley & Sons, vol. 18, no. 10, pp. 1169-1186, 2006.

    Rosa M. Badia, Raul Sirvent, Jesus Labarta, Josep M. Perez, "Programming the GRID: An Imperative Language-based Approach", Engineering The Grid: Status and Perspective, Section 4, Chapter 12, American Scientific Publishers, January 2006.

    Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela and Rogeli Grima, "Programming Grid Applications with GRID Superscalar", Journal of Grid Computing, Volume 1, Issue 2, 2003.

GRID superscalar: a programming model for the Grid


Grid superscalar programming interface and runtime1
GRID superscalar: programming interface and runtime

  • Work related to standards

    R.M. Badia, D. Du, E. Huedo, A. Kokossis, I. M. Llorente, R. S. Montero, M. de Palol, R. Sirvent, and C. Vázquez, "Integration of GRID superscalar and GridWay Metascheduler with the DRMAA OGF Standard", Euro-Par, 2008.

    Raül Sirvent, Andre Merzky, Rosa M. Badia, Thilo Kielmann, "GRID superscalar and SAGA: forming a high-level and platform-independent Grid programming environment", CoreGRID Integration Workshop. Integrated Research in Grid Computing, Pisa (Italy), 2005.

GRID superscalar: a programming model for the Grid


Outline4
Outline

  • Introduction

  • Programming interface

  • Runtime

  • Fault tolerance at the programming model level

    4.1 Checkpointing

    4.2 Retry mechanisms

    4.3 Task replication

  • Conclusions and future work

GRID superscalar: a programming model for the Grid


4 1 checkpointing
4.1 Checkpointing

  • Inter-task checkpointing

  • Recovers sequential consistency in the out-of-order execution of tasks

    • Single version of every file is saved

    • No need to save any data structures in the runtime

  • Drawback: some completed tasks may be lost

    • Application-level checkpoint can avoid this

0

1

2

3

3

4

5

6

GRID superscalar: a programming model for the Grid


4 1 checkpointing1
4.1 Checkpointing

  • Conclusions

    • Low complexity in order to checkpoint a task

      • ~1% overhead introduced

    • Can deal with both application level errors or Grid level errors

      • Most important when an unrecoverable error appears

    • Transparent for end users

GRID superscalar: a programming model for the Grid


4 2 retry mechanisms
4.2 Retry mechanisms

C

C

Middleware

Automatic drop of machines

GRID superscalar: a programming model for the Grid


4 2 retry mechanisms1
4.2 Retry mechanisms

Soft timeout

Failure

Success

Middleware

Soft timeout

Hard timeout

Soft and hard timeouts for tasks

GRID superscalar: a programming model for the Grid


4 2 retry mechanisms2

syscall

Request

4.2 Retry mechanisms

Success

Failure

Failure

Success

Middleware

Retry of operations

GRID superscalar: a programming model for the Grid


4 2 retry mechanisms3
4.2 Retry mechanisms

  • Conclusions

    • Keep running despite failures

    • Dynamic: when and where to resubmit

    • Detects performance degradations

    • No overhead when no failures are detected

    • Transparent for end users

GRID superscalar: a programming model for the Grid


4 3 task replication
4.3 Task replication

0

0

1

1

1

2

2

3

4

5

6

7

Middleware

Replicate running tasks depending on successors

GRID superscalar: a programming model for the Grid


4 3 task replication1
4.3 Task replication

0

0

1

1

2

3

4

5

6

7

Middleware

Replicate running tasks to speed up the execution

GRID superscalar: a programming model for the Grid


4 3 task replication2
4.3 Task replication

  • Conclusions

    • Dynamic replication: application level knowledge is used (the workflow)

    • Replication can deal with failures hiding retry overhead

    • Replication can speed up applications in heterogeneous Grids

    • Transparent for end users

    • Drawback: increased usage of resources

GRID superscalar: a programming model for the Grid


4 fault tolerance features
4. Fault tolerance features

  • Publications

    Vasilis Dialinos, Rosa M. Badia, Raül Sirvent, Josep M. Pérez and Jesús Labarta, "Implementing Phylogenetic Inference with GRID superscalar", Cluster Computing and Grid2005 (CCGRID 2005), Cardiff, UK, 2005.

    Raül Sirvent, Rosa M. Badia and Jesús Labarta, "Graph-based task replication for workflow applications", Submitted, HPCC 2009.

GRID superscalar: a programming model for the Grid


Outline5
Outline

  • Introduction

  • Programming interface

  • Runtime

  • Fault tolerance at the programming model level

  • Conclusions and future work

GRID superscalar: a programming model for the Grid


5 conclusions and future work
5. Conclusions and future work

  • Grid-unaware programming model

  • Transparent features for users, exploiting parallelism and failure treatment

  • Used in REAL systems and REAL applications

  • Some future research is already ONGOING (StarSs)

GRID superscalar: a programming model for the Grid


5 conclusions and future work1
5. Conclusions and future work

  • Future work

    • Grid of supercomputers (Red Española de Supercomputación)

    • Higher scale tests (hundreds? thousands?)

    • More complex brokering

      • Resource discovery/monitoring

      • New scheduling policies based on the workflow

      • Automatic prediction of execution times

    • New policies for task replication

    • New architectures for StarSs

GRID superscalar: a programming model for the Grid


ad