Computing hw requirement
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

COMPUTING HW REQUIREMENT PowerPoint PPT Presentation


  • 33 Views
  • Uploaded on
  • Presentation posted in: General

COMPUTING HW REQUIREMENT. Enzo Papandrea. GEOFIT - MTR. With Geofit measurements from a full orbit are simultaneously processed A Geofit where P, T and VMR of H 2 O and O 3 are simultaneously retrieved increase the computing time. TIME OF SIMULATIONS.

Download Presentation

COMPUTING HW REQUIREMENT

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Computing hw requirement

COMPUTING HW REQUIREMENT

Enzo Papandrea

COMPUTING - HW REQUIREMENTS


Geofit mtr

GEOFIT - MTR

  • With Geofit measurements from a full orbit are simultaneously processed

  • A Geofit where P, T and VMR of H2O and O3 are simultaneously retrieved increase the computing time

COMPUTING - HW REQUIREMENTS


Time of simulations

TIME OF SIMULATIONS

Computing Time: sequential algorithm

  • We made some simulations with an Alphas. ES45, CPU 1 GHz

  • H2OTS = 1h 30m (TS = TSEQUENTIAL)

  • O3TS = 4h 40m

  • PTTS = 9h 48m

  • MTRTS = 10h 30m

…to reduce the time of the simulations we propose a parallel system

COMPUTING - HW REQUIREMENTS


Parallelization

PARALLELIZATION

  • The first step will be to parallelize the loop that computes the forward model because:

  • It is the most time consuming part of the code.

  • The computation of the forward model for one sequence is independent from the computation of another sequence so that processors have to communicate data only at the beginning and at the end of the forward model.

COMPUTING - HW REQUIREMENTS


Parallel time

PARALLEL TIME

  • Parallel time (TP) is the sequential time divided the number of CPUs

  • Example, for a system with 8 CPUs if the algorithm is completely parallel:

TP= TS/8 = 12.5% of sequential time

This is the best improvement we can reach with 8 CPUs

COMPUTING - HW REQUIREMENTS


Forward model parall

FORWARD MODEL PARALL.

If we parallelize only the forward model we can do an evaluation of the simulations time with 8 CPUs:

  • TForward model (3 iterations): 45m

    Sum of the times to compute the forward model

  • TP = TForward model/#CPU = 45m/8 = 6m

    Time of parallelized code

  • T = TS + TP = (1h 30m - 45m) + 6m = 51m = 56%

    Total time (sum of the time of code remainedsequential and time of code parallelized)

H2O

COMPUTING - HW REQUIREMENTS


Fw model parall 1

FW MODEL PARALL./1

  • TForward model (2 it): 4h 10m, TP = 30m

  • T = 60m = 21%

O3

  • TForward model (2 it): 9h 33m, TP = 1h 12m

  • T = 1h 26m = 15%

PT

  • TForward model (2 it): 10h 30m, TP = 1h 11m

  • T = 2h 11m = 20%

MTR

COMPUTING - HW REQUIREMENTS


Memory classification

M

M

P

P

P

P

P

M

NETWORK

M

P

P

P

Shared Memory

Local Memory

M

MEMORY CLASSIFICATION

In order to use a parallel code we need an appropriate hardware witch can be classified by memory:

Each processor can see only its memory: to exchange data we need a network

Each processor (P) can see the whole memory (M)

COMPUTING - HW REQUIREMENTS


Open mp vs mpi

OPEN-MP VS MPI

With systems Local Memory is used MPI + call to libraries

The header file mpif.h contains definitions of MPI constants, MPI types and functions

  • With systems Shared Memory is used OpenMP + compiler directives

  • Parallelism is not visible to the programmer (compiler responsible for parallelism)

  • Easy to do

  • Small improvements in performance

  • Parallelism is visible to the programmer

  • Difficult to do

  • Large improvements in performance

COMPUTING - HW REQUIREMENTS


Open mp example

f90 –omp name_program

setenv OMP_NUM_THREADS 2

f90 name_program

OPEN-MP EXAMPLE

If we compile in this way the compiler will treat the instructions beginning with !$ like comments

PROGRAM Matrix

IMPLICIT NONE

INTEGER (KIND=4) :: i, j

INTEGER (KIND=4), parameter :: n = 1000

INTEGER (KIND=4) :: a(n,n)

!$ OMP PARALLEL DO &

!$ PRIVATE(i,j) &

!$ SHARED(a)

DO j = 1, n

DO i = 1, n

a(i,j) = i + j

ENDDO

ENDDO

!$ OMP END PARALLEL DO

END

If we compile with –omp flag the compiler will read these instructions

COMPUTING - HW REQUIREMENTS


Mpi example

MPI EXAMPLE

SENDand RECEIVE

POINT TO POINT COMMUNICATION:

MPI_SEND(buf, count, type, dest, tag, comm, ierr)

MPI_RECV(buf, count, type, dest, tag, status, comm, ierr)

BUFarray of type type

COUNTnumber of elements of buf to be sent

TYPE MPI type of buf

DESTrank of the destination process

TAGnumber identifying the message

COMMcommunicator of the sender and receiver

STATUS array containing communication status

IERRerror code (if ierr = 0 no error occurs)

COMPUTING - HW REQUIREMENTS


Mpi example 1

DATA

DATA

PROCESSES

A0

A0

PROCESSES

A0

A0

A0

MPI EXAMPLE/1

BROADCAST (ONE TO ALL COMMUNICATION): SAME DATA SENT FROM ROOT PROCESS TO ALL OTHERS IN THE COMMUNICATOR

COMPUTING - HW REQUIREMENTS


Mpi comminicator

0

3

2

1

4

5

7

6

MPI COMMINICATOR

  • IN MPI IT IS POSSIBLE TO DIVIDE THE TOTAL NUMBER OF PROCESSES INTO GROUPS, CALLED COMMUNICATORS

  • THE COMMUNICATOR THAT INCLUDES ALL PROCESSES IS CALLED MPI_COMM_WORLD

COMPUTING - HW REQUIREMENTS


Broadcast example

BROADCAST EXAMPLE

P:1 after broadcast buffer is 24.

P:3 after broadcast buffer is 24.

P:4 after broadcast buffer is 24.

P:0 after broadcast buffer is 24.

P:5 after broadcast buffer is 24.

P:6 after broadcast buffer is 24.

P:7 after broadcast buffer is 24.

P:2 after broadcast buffer is 24.

PROGRAM Broadcast

IMPLICIT NONE

INCLUDE 'mpif.h'

REAL (KIND=4) :: buffer

INTEGER (KIND=4) :: err, rank, size

CALL MPI_INIT(err)

CALL MPI_COMM_RANK(MPI_WORLD_COMM, rank, err)

CALL MPI_COMM_SIZE(MPI_WORLD_COMM, size, err)

if(rank .eq. 5) buffer = 24.

call MPI_BCAST(buffer, 1, MPI_REAL, 5, MPI_COMM_WORLD, err)

print *, "P:", rank," after broadcast buffer is ", buffer

CALL MPI_FINALIZE(err)

END

Proc. 5 sends its real variablebuffer to the processes in the comm. MPI_COMM_WORLD

COMPUTING - HW REQUIREMENTS


Other collective communications

DATA

DATA

PROCESSES

A0

PROCESSES

A0

B0

C0

D0

B0

A0

B0

C0

D0

A0

B0

C0

D0

C0

A0

B0

C0

D0

D0

DATA

DATA

PROCESSES

PROCESSES

A0

A0

A1

A2

A3

A1

A2

A3

OTHER COLLECTIVE COMMUNICATIONS

ALLGATHER:

DIFFERENT DATA SENT FROM DIFFERENT PROCESSES TO ALL OTHER IN THE COMMUNICATOR

SCATTER:

DIFFERENT DATA SENT FROM ROOT PROCESS TO ALL OTHER IN THE COMMUNICATOR

GATHER: THE OPPOSITE OF SCATTER

COMPUTING - HW REQUIREMENTS


Linux cluster

LINUX CLUSTER

  • We have a linux cluster with 8 nodes, each node:

  • CPU Intel P4, 2.8Ghz, Front Side Bus 800Mhz

  • 2 Gbyte RAM 333Mhz

  • Hard Disk 40 Gbyte

  • 1 Switch LAN (Network)

COMPUTING - HW REQUIREMENTS


Conclusions

CONCLUSIONS

Linux cluster (Local memory):

  • Alphas. with 2 CPUs Shared Memory:

  • Cheap (~900,00 €/node)

  • Illimitated #CPU

  • In the past only arch. 32 bits

    2(32-1) = 2 Gbyte = 2 · 230 bytes

  • Now architecture 64 bits! 2(64-1) = 8 Exabyte = 8 · 260 bytes

  • Very expensive (~200.000,00 €)

  • Limitated #CPU

For readability and simplicity of the code we would like to use Fortran 90

COMPUTING - HW REQUIREMENTS


  • Login