Data Flow Pattern Analysis of Scientific Applications . Michael Frumkin Parallel Systems & Applications Intel Corporation May 6, 2005. Outline. Why Data Flow Pattern Analysis? CFD Applications The NAS Parallel Benchmarks The NAS Grid Benchmarks Trace File Analysis Conclusions.

Michael Frumkin

Parallel Systems & Applications

Intel Corporation

May 6, 2005

• Why Data Flow Pattern Analysis?

• CFD Applications

• The NAS Parallel Benchmarks

• The NAS Grid Benchmarks

• Trace File Analysis

• Conclusions

• Scientific applications

• model few natural processes

• new effects are added infrequently

• influence on the existing data flows are insignificant

• Knowledge of data flow in program helps with

• program understanding

• building application performance model

• Time represented as an outer loop

• Iterations over time step

• Space is represented by structured/unstructured grids

• Important for understanding data locality

• Data access patterns

• Spatial parallelism

• Physics is represented by an operator at each grid point

• Data flow

• Operator level of parallelism/dependence

• Solve the Navier-Stokes equation

K(ui+1)=Lui

• u is five-dimensional vector

• K is non-linear operator

• Solver

• RHS computation

x-solve

y-solve

z-solve

• Multilevel parallelism

y-solve

x-solve

Multipartition

z-solve

• Stencil operators (explicit methods)

• At each point of a 3-dimensional mesh apply:

seven-point

27-point

Dependence Matrices

(

)

(

)

• Two-dimensional pipeline

• Hyperplane algorithm

-1 0 0 1 0 0

0 -1 0 0 1 0

0 0 -1 0 0 1

Interpolation & Smoothing

Projection

Interpolation & Smoothing

Projection

Projection

Interpolation & Smoothing

Interpolation & Smoothing

Projection

Smoothing

Data Flow Analysis

do k=1,ksize

do j=1,jsize

do i=1,isize

do_45

do_134

do_330

Each arc represents Affinity Relation

NAS Parallel Benchmarks

• Application Benchmarks

• CFD

• BT, SP, LU

• Data Intensive

• DC, DT, BTIO

• Computational Chemistry

• UA

• Kernel Benchmarks

• FT, CG, MG, IS

• Verification

• Performance Model

• FORTRAN, C, HPF, Java*

• Serial, MPI, OpenMP, Java* Threads

• Shuffles

• Sorting

• FFT

• Routing

• Gather/Scatter

• MD and FE codes

• Sparse matrices

• Transpose

• FFT

• Sorting

• Tree

• Parallel prefix, Reduction

• Sorting

HPC Challenge Benchmarks

• HPL*

• DGEMM*

• STREAM*

• PTRANS*

• FFTE*

• RandomAccess*

• Effective Bandwidth b_eff*

Programming With Directed Graphs

• Arc

• Arc* newArc(Node *tail, Node *head)

• AttachArc(DGraph *dg)

• deleArc(Arc *ar)

• Node

• newNode(char *name)

• Node* AttachNode(DGraph *dg)

• deleteNode(Node *nd)

• DGraph

• DGraph* newDGraph(char *name)

• writeGraph(DGraph *dg, char* fname)

do_134

• Parse trees

• File Systems

• Device Schematics

Visualization and layout Tools

• VCG tool

• Edge tool

• Tom Sawyer Software

• Commercial tools

Cart3D*

• Performs CFD analysis on complex geometries

• Uses six executables

• Intersect* – intersects geometry

• Cubes* – produces Cartesian meshes

• Reorder* – reorders meshes

• Mgprep* – coarsens mesh

• flowCart* – convergence acceleration

• Clic* – analyzes the flow

• Executables communicate via files

• Returns relevant forces

• Lift, Drag, Side Force

Launch

LU2

LU4

LU8

MG4

MG8

MG2

FT8

FT8

FT2

Report

#steps

Helical Chain (HC)

Launch

Embarrassingly Distributed (ED)

Visualization Pipeline (VP)

BT

SP

LU

Launch

Launch

BT

SP

LU

SP

SP

SP

SP

SP

SP

SP

SP

SP

BT

MG

FT

BT

SP

LU

BT

MG

Report

FT

BT

MG

FT

Report

Report

The NAS Grid Benchmarks

• Contain four patterns

• Embarrassingly Distributed (ED)

• Helical Chain (HC)

• Visualization Pipeline (VP)

• Mixed Bag (MB)

Data Dependent Patterns

• Intermittent patterns

• Useful for application performance tuning

• Visualization is important

• Allows to employ human eye ability to detect patterns

• Automatic Pattern Mining

• OLAP approach

• MPI communication patterns

Data Flow in Applications

• Application Parallelization

• Application Understanding

• Application Mapping

• Application Performance