Efficient parallelization for amr mhd multiphysics calculations
This presentation is the property of its rightful owner.
Sponsored Links
1 / 71

Efficient Parallelization for AMR MHD Multiphysics Calculations PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Efficient Parallelization for AMR MHD Multiphysics Calculations. Implementation in AstroBEAR. Outline. Motivation Better memory management Sweep updates Distributed tree Improved parallelization Level threading Dynamic load balancing. Outline. Motivation – HPC

Download Presentation

Efficient Parallelization for AMR MHD Multiphysics Calculations

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Efficient parallelization for amr mhd multiphysics calculations

Efficient Parallelization for AMR MHD Multiphysics Calculations

Implementation in AstroBEAR


Outline

Outline

  • Motivation

  • Better memory management

  • Sweep updates

  • Distributed tree

  • Improved parallelization

  • Level threading

  • Dynamic load balancing


Outline1

Outline

  • Motivation – HPC

    • Better memory management

      • Sweep updates

      • Distributed tree

    • Improved parallelization

      • Level threading

      • Dynamic load balancing


Efficient parallelization for amr mhd multiphysics calculations

Sweep Method for Unsplit Schemes


Sweep method for unsplit stencils

Sweep Method for Unsplit Stencils

  • At the end of an update we need new values for the fluid quantities Q at time step n+1


Sweep method for unsplit stencils1

Sweep Method for Unsplit Stencils

  • At the end of an update we need new values for the fluid quantities Q at time step n+1

  • The new values for Q will depend on the EMF’s calculated at the cell corners


Sweep method for unsplit stencils2

Sweep Method for Unsplit Stencils

  • At the end of an update we need new values for the fluid quantities Q at time step n+1

  • The new values for Q will depend on the EMF’s calculated at the cell corners

  • As well as the x and y fluxes at the cell faces


Sweep method for unsplit stencils3

Sweep Method for Unsplit Stencils

  • At the end of an update we need new values for the fluid quantities Q at time step n+1

  • The new values for Q will depend on the EMF’s calculated at the cell corners

  • As well as the x and y fluxes at the cell faces

  • The EMF’s themselves also depend on the fluxes at the faces adjacent to each corner.


Sweep method for unsplit stencils4

Sweep Method for Unsplit Stencils

  • At the end of an update we need new values for the fluid quantities Q at time step n+1

  • The new values for Q will depend on the EMF’s calculated at the cell corners

  • As well as the x and y fluxes at the cell faces

  • The EMF’s themselves also depend on the fluxes at the faces adjacent to each corner.

  • Each calculation extends the stencil until we are left with the range of initial values Q at time step n needed for the update of a single cell


Sweep method for unsplit stencils5

Sweep Method for Unsplit Stencils

  • At the end of an update we need new values for the fluid quantities Q at time step n+1

  • The new values for Q will depend on the EMF’s calculated at the cell corners

  • As well as the x and y fluxes at the cell faces

  • The EMF’s themselves also depend on the fluxes at the faces adjacent to each corner.

  • Each calculation extends the stencil until we are left with the range of initial values Q at time step n needed for the update of a single cell


Sweep method for unsplit stencils6

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils7

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils8

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils9

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils10

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils11

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils12

Sweep Method for Unsplit Stencils


Sweep method for unsplit stencils13

Sweep Method for Unsplit Stencils


Efficient parallelization for amr mhd multiphysics calculations

Distributed Tree


Distributed tree

Distributed Tree

Level 0 base grid

Level 0 node


Distributed tree1

Distributed Tree

Parent-Child

Level 1 nodes are children of level 0 node

Level 1 grids nested within parent level 0 grid


Distributed tree2

Distributed Tree

Neighbor

Neighbor

Conservative methods require synchronization of fluxes between neighboring grids as well as the exchanging of ghost zones


Distributed tree3

Distributed Tree

Level 2 nodes are children of level 1 nodes

Level 2 grids nested within parent level 1 grids


Distributed tree4

Distributed Tree

Since the mesh is adaptive there are successive generations of grids and nodes


Distributed tree5

Distributed Tree

New generations of grids need access to data from the previous generation of grids on the same level that physically overlap


Distributed tree6

Distributed Tree

Current generation


Distributed tree7

Distributed Tree

Current generation

Previous generation


Distributed tree8

Distributed Tree

Current generation

Previous generation


Distributed tree9

Distributed Tree

Current generation

Level 1 Overlaps

Previous generation


Distributed tree10

Distributed Tree

Current generation

Level 2 Overlaps

Previous generation


Distributed tree11

Distributed Tree

Current generation

AMR Tree

Previous generation

While the AMR grids and the associated computations are normally distributed across multiple processors, the tree is often not.


Distributed tree12

Distributed Tree

Current generation

Previous generation

Each processor only needs to maintain its local sub-tree with connections to each of its grid’s surrounding nodes (parent, children, overlaps, neighbors)


Distributed tree13

Distributed Tree

For example processor 1 only needs to know about the following sub-tree


Distributed tree14

Distributed Tree

And the sub-tree for processor 2


Distributed tree15

Distributed Tree

And the sub-tree for processor 3


Distributed tree16

Distributed Tree

  • While the memory and communication required for each node is small to that required for each grid, the memory required for the entire AMR tree can overwhelm that required for the local grids when the number of processors becomes large.

  • Consider a mesh in which each grid is 8x8x8 where each cell contains 4 fluid variables. Lets also assume that each node requires 7 values describing its physical location and the host processor. The memory requirement for the local grids becomes comparable to the AMR tree when there are 8x8x8x4/7 = 292 processors.

  • Even if each processor prunes its local tree by discarding nodes not connected to its local grids, the communication required to globally share each new generation of nodes will eventually prevent the simulation from scaling.


Distributed tree17

Distributed Tree

  • When successive generations of level n nodes are created by their level n-1 parent nodes, each processor must update its local sub-tree with the subset of level n nodes that connect to the nodes associated with its local grids.

  • Instead of globally communicating every new level n node to every other processor, each processor determines which other processors need to know about which new level n nodes.

  • Since children are always nested within parent grids, nodes that neighbor or overlap will have parents that neighbor or overlap respectively. This can be used to localize tree communication as follows.


Sub tree construction

Sub-tree construction

Tree starts at a single root node that persists from generation to generation

Processor 2

Processor 3

Processor 1


Sub tree construction1

Sub-tree construction

Root node creates next generation of level 1 nodes

Processor 2

Processor 3

Processor 1


Sub tree construction2

Sub-tree construction

Root node creates next generation of level 1 nodes

It next identifies overlaps between new and old children

Processor 2

Processor 3

Processor 1


Sub tree construction3

Sub-tree construction

Root node creates next generation of level 1 nodes

It next identifies overlaps between new and old children

As well as neighbors among new children

Processor 2

Processor 3

Processor 1


Sub tree construction4

Sub-tree construction

New nodes along with their connections are then communicated to the assigned child processor.

Processor 2

Processor 3

Processor 1


Sub tree construction5

Sub-tree construction

Processor 2

Processor 3

Processor 1


Sub tree construction6

Sub-tree construction

Processor 2

Processor 3

Processor 1


Sub tree construction7

Sub-tree construction

Level 1 nodes then locally create new generation of level 2 nodes

Processor 2

Processor 3

Processor 1


Sub tree construction8

Sub-tree construction

Level 1 nodes then locally create new generation of level 2 nodes

Processor 2

Processor 3

Processor 1


Sub tree construction9

Sub-tree construction

Processor 2

Processor 3

Processor 1

Level 1 grids communicate new children to their overlaps so that new overlap connections on level 2 can be determined


Sub tree construction10

Sub-tree construction

Processor 2

Processor 3

Processor 1

Level 1 grids communicate new children to their neighbors so that new neighbor connections on level 2 can be determined


Sub tree construction11

Sub-tree construction

Processor 2

Processor 3

Processor 1

Level 1 grids and their local trees are then distributed to child processors.


Sub tree construction12

Sub-tree construction

Processor 2

Processor 3

Processor 1

Each processor now has the necessary tree information to update their local grids.


Level threading

Level Threading


Level threading1

Level Threading


Level threading2

Level Threading


Level threading3

Level Threading


Level threading4

Level Threading


Level threading5

Level Threading


Level threading6

Level Threading


Level threading7

Level Threading


Dynamic load balancing

Dynamic Load Balancing


Calculating child workload shares

Calculating Child Workload Shares


Calculating child workload shares1

Calculating Child Workload Shares


Calculating child workload shares2

Calculating Child Workload Shares


Calculating child workload shares3

Calculating Child Workload Shares


Calculating child workload shares4

Calculating Child Workload Shares


Calculating child workload shares5

Calculating Child Workload Shares


Calculating child workload shares6

Calculating Child Workload Shares


Dynamic load balancing1

Dynamic Load Balancing


Dynamic load balancing2

Dynamic Load Balancing


Dynamic load balancing3

Dynamic Load Balancing


  • Login