COSMO - Dynamical Core Rewrite
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi POMPA Workshop, Manno, 3.5.2011. Supercomputing Systems AGFon +41 43 456 16 00 Technopark 1Fax +41 43 456 16 10 8005 Zürichwww.scs.ch. Approach. COSMO Dynamical Core Rewrite. Challenge

Download Presentation

COSMO - Dynamical Core Rewrite Approach, Rewrite and Status Tobias Gysi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cosmo dynamical core rewrite approach rewrite and status tobias gysi

COSMO - Dynamical Core Rewrite

Approach, Rewrite and Status

Tobias Gysi

POMPA Workshop, Manno, 3.5.2011

Supercomputing Systems AGFon +41 43 456 16 00

Technopark 1Fax +41 43 456 16 10

8005 Zürichwww.scs.ch


Approach

Approach


Cosmo dynamical core rewrite

COSMO Dynamical Core Rewrite

Challenge

  • Assuming that the COSMO code will continue run on commodity processors in the next couple of years, what is the performance improvement we can achieve by rewriting the dynamical core?

    Boundary Conditions

  • Do not touch the underlying physical model (i.e. equations that are being solved)

    • Formulas must remain as they are

    • Arbitrary ordering of computations, etc. may change

    • Results must remain ‘identical’ to ‘a very high level of accuracy’

  • Part of an initiative looking at all parts of the COSMO code.

    Support

  • Support from & direct interaction with MeteoSwiss, DWD, CSCS, C2SM


Approach1

Approach

Feasibility Study

Lib. Design

Rewrite

Test

Tune

CPU

GPU

t

~2 Years

Feasibility

Library

Test & Tune

You Are Here


Feasibilty study

Feasibilty Study


Feasibility study overview

Feasibility Study - Overview

  • Get to know the code

  • Understand performance characteristics

  • Find computational motives

    • Stencil

    • Tri-Diagonal Solver

  • Implement a prototype code

    • Relevant part of the dynamical core (Fast Wave Solver, ~30% of total runtime)

    • Try to optimize for x86

    • No MPI parallelization


Feasibility study prototype

Feasibility Study - Prototype

  • Implemented in C++

  • Optimize for memory-bandwidth utilization

    • Avoid pre-computation, do computation on the fly

    • Merge loops accessing the common variables

    • Use iterators rather than full index calculation on 3D grid

    • Store data contiguous in ‘k-direction’ (vertical columns)


Fast wave solver speedup

Fast Wave Solver - Speedup

The performance difference is NOT due to programming language but due to code optimizations!


Feasibility study conclusion

Feasibility Study - Conclusion

  • A performance increase of 2x has been achieved on a representative part of the code

  • Main optimizations identified (for scalar processors)

    • Avoid pre-calculation whenever possible

    • Merge loops

    • Change the storage order to k-first

  • Performance is all about memory bandwidth


Rewrite

Rewrite


Design targets

Design Targets

Write a code that

  • Delivers the right results

    • Dedicated unit-tests & verification framework

  • Apply the performance optimization strategies used in the prototype

  • Can be developed within a year to run on x86 and GPU platforms

    • Mandatory: support three-level parallelism in a very flexible way

      • Vector processing units (e.g. SSE)

      • Multi-core node (sub-domain)

      • Multiple nodes (domain) - not part of the SCS project

    • Optional: write one single code that can be compiled to both platforms


Design targets1

Design Targets

Write a code that

  • Facilitates future improvements in terms of

    • New models / algorithms

    • Portability to new computer architectures

  • Can and will be integrated by the COSMO consortium into the main branch


Stencil library ideas

Stencil Library - Ideas

  • It is challenging to develop a stencil library

    • There is no big chunk of work that can be hidden behind a API call (e.g. matrix multiplication)

    • The actual update function of the stencil is heavily application specific and performance critical

    • We use a DSEL like approach (Domain Specific Embedded Language)

    • “Stencil language” embedded in C++

    • Separate description of loop logic and update function

    • During compile time generate optimized C++ code(possible due to C++ meta programming capabilities)


Stencil library parallelization

Stencil Library - Parallelization

  • Parallelization on the node level is done by

    • Splitting the calculation domain into blocks (IJ-Plane)

    • Parallelize the work over the blocks

    • Double buffering avoids concurrency issues


Stencil library loop merging

Stencil Library – Loop Merging

  • The library allows the definition of multiple stages per stencil

    • Stages are update functions applied consecutively to one block

    • As a block is typically much smaller than the complete domain  we can leverage the caches of the CPU


Stencil library calculation on the fly

Stencil Library – Calculation On The Fly

  • Calculation on the fly is supported using a combination of stages and column buffers

    • Column buffers are fields with the size of one block local to every CPU core

    • A first stage writes to a buffer while a second stage consumes the pre-calculated values


Stencil code my toy example

Stencil Code – My Toy Example

1. Naive

for k

a(k) := b(k) + c(k)

end

...

for k

d(k) := a(k-1)*e(-1) +

a(k)*e(0) +

a(k+1)*e(+1)

end

...

for k

f(k) := a(k)*g(k) + d(k)

end


Stencil code my toy example1

Stencil Code – My Toy Example

1. Naive

for k

a(k) := b(k) + c(k)

end

...

for k

d(k) := a(k-1)*e(-1) +

a(k)*e(0) +

a(k+1)*e(+1)

end

...

for k

f(k) := a(k)*g(k) + d(k)

end

2. No pre-calculation

for k

d(k) := (b(k-1)+c(k-1))*e(-1) + (b(k)+c(k))*e(0) +

(b(k+1)+c(k+1))*e(+1)

f(k) := (b(k)+c(k))*g(k) + d(k)

end


Stencil code my toy example2

Stencil Code – My Toy Example

1. Naive

for k

a(k) := b(k) + c(k)

end

...

for k

d(k) := a(k-1)*e(-1) +

a(k)*e(0) +

a(k+1)*e(+1)

end

...

for k

f(k) := a(k)*g(k) + d(k)

end

3. Pre-calculation with temporary variables

for k

z := b(k+1) + c(k+1)

d(k) := x*e(-1) +

y*e(0) +

z*e(+1)

f(k) := y*g(k) + d(k)

x:=y

y:=z

end


Stencil code my toy example3

Stencil Code – My Toy Example

4. Pre-calculation with column buffer

for k

a(k) := b(k) + c(k)

end

for k

d(k) := a(k-1)*e(-1) +

a(k)*e(0) +

a(k+1)*e(+1)

f(k) := a(k)*g(k) + d(k)

end

1. Naive

for k

a(k) := b(k) + c(k)

end

...

for k

d(k) := a(k-1)*e(-1) +

a(k)*e(0) +

a(k+1)*e(+1)

end

...

for k

f(k) := a(k)*g(k) + d(k)

end


Stencil code my toy example4

Stencil Code – My Toy Example

1. Naive

for k

a(k) := b(k) + c(k)

end

...

for k

d(k) := a(k-1)*e(-1) +

a(k)*e(0) +

a(k+1)*e(+1)

end

...

for k

f(k) := a(k)*g(k) + d(k)

end

5. Pre-calculation with stages & column Buffer

Stencil

Stage 1

a := b + c

Stage 2

d := a*e (k:-1,0,1)

Stage 3

f := a*g + d

Apply Stencil


Status

Status


Status1

Status

  • So far the following stencils have been implemented:

    • Fast wave solver (w bottom boundary initialization missing)

    • Advection

      • 5th order advection

      • Bott 2 advection (cri implementation missing)

    • Complete tendencies

    • Horizontal Diffusion

    • Coriolis

  • The next steps are:

    • Implicit vertical diffusion

    • Put it all together

    • Performance optimization


Discussion

Discussion

Acknowledgements to all our collaborators at

  • C2SM (Center for Climate Systems Modeling)

  • MeteoSwiss

  • DWD (DeutscherWetterdienst)

  • CSCS


  • Login