1 / 14

PaCT-2007: Optimized Parallel Approach for 3D Modelling

Download Presentation

PaCT-2007: Optimized Parallel Approach for 3D Modelling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimized Parallel Approach for 3D Modelling of Forest Fire BehaviourG. Accary, O. Bessonov, D. Fougère, S. Meradji, D. MorvanInstitute for Problems in Mechanics, Moscow, RussiaUniversité de la Méditerranée, Marseille, FranceUniversité Saint-Esprit de Kaslik, Jounieh, LebanonParallel Computing Technologies -- PaCT-2007

  2. PaCT-2007:Optimized Parallel Approach for 3D Modelling Introduction In this work we present methods for parallelization of 3D CFD forest fire modelling code FIRESTAR 3D on NuMA computers in frame of OpenMP environment. --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Numerical model and method Why to parallelize ? Computer system selected for this development Parallelization models Specifics of OpenMP on NuMA computers How to parallelize for OpenMP on NuMA ? Example of OpenMP parallelization, geometric parallelism Current approach to parallelize FIRESTAR 3D Parallelization results for the benchmark problems Parallelization of radiative transfer (input data parallelism) Conclusion

  3. PaCT-2007:Optimized Parallel Approach for 3D Modelling Numerical model and method Full-physical 3D model of forest fire behaviour Complex unsteady flow in 3D rectangular domain Solid phases (vegetation) and gas mixture Decomposition mechanisms:drying, pyrolysis, combustion Transfer: convection, diffusion, radiation, turbulence Navier-Stokes equations in Boussinesq approximation Finite Volume discretization, non-uniform staggered grid Fully implicit segregated SIMPLER-style solution method Linear solvers BiCGStab (nonsymmeric), CG (symmetric) Explicit-class preconditioners for linear solvers

  4. PaCT-2007:Optimized Parallel Approach for 3D Modelling Why to parallelize ? 3D vs. 2D: -- much bigger grid (Nx*Ny*Nz grid points vs. Nx*Ny); -- more complicated discretizations; -- additional grid compression in problematic areas; As a result, total computational complexity increases by (at least) 2 orders of magnitude. Goal: to accelerate by about 10 times (as minimum) and to achieve (along with another optimizations) the speed of 2D simulations.

  5. PaCT-2007:Optimized Parallel Approach for 3D Modelling Computer system selected for this development SGI Altix 350 shared-memory system 20 processors Itanium 2 1.5 Ghz 4M NuMA organization of the system (Non-uniform Memory Architecture): 10 bi-processor modules, with local memory in a module (SMP-nodes), interconnected by very fast interface Current configuration: 8 nodes (16 CPUs) connected to the NuMA switch - "batch domain" for intensive computations. 2 nodes (4 CPUs) - "interactive" domain for development and debug.

  6. PaCT-2007:Optimized Parallel Approach for 3D Modelling Parallelization models 2 principal models of parallelization: message passing (MPI): - more universal; - can be applied to distributed memory systems (clusters) as well as to shared memory computers; - complicated to program, requires total reorganization of a code and (often) revision of algorithms. shared memory (OpenMP): - looks as an extension of Fortran and C programming languages; - comment-like directives (ignored if compiled without "-openmp" switch); - simple to program, allows to easily parallelize many algorithms. !$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx {processing} enddo enddo enddo !$OMP END DO

  7. PaCT-2007:Optimized Parallel Approach for 3D Modelling Specifics of OpenMP on NuMA computers Access to the memory within a node (local memory) is fast; access to the memory within another node (remote memory) is much slower ==> Distribution of main data arrays in local memories must correspond to the distribution of computational work between processors !!! This is not supported explicitly by OpenMP ==> Special initialization is required (e.g. assignment in a parallel loop). Affiliation (binding) of CPUs to processes in order to avoid migration between processors (e.g. "dplace" utility).

  8. PaCT-2007:Optimized Parallel Approach for 3D Modelling How to parallelize for OpenMP on NuMA ? Usually, geometric parallelism is applied - data elements are split in some dimenstion. FIRESTAR 3D - most computations are in CG solvers & calculation of turbulent quantities => easily and naturally parallelizable in OpenMP. Algorithms with recursive dependences (3-diag solvers, line Jacobi/GS preconditioners) - more difficult, not naturally (in development). Restrictions of OpenMP/NuMA: parallelization in only one spatial direction (~16 CPUs is a limit) Input data parallelism (or event parallelism) - for radiative transport equation (split by angles).

  9. PaCT-2007:Optimized Parallel Approach for 3D Modelling Example of OpenMP parallelization (geometric parallelism) !$OMP DO do K=1,Nz do J=1,Ny do I=1,Nx Wo3(I,J,K)=Wo2(I,J,K)+ & beta*Wo3(I,J,K) enddo enddo enddo !$OMP END DO Every processor computes its own part of the outermost DO-loop (do K=1,Nz). Iterations of this loop are split evenly between all CPUs. Portions of 3D data arrays must be distributed between local memories accordingly.

  10. PaCT-2007:Optimized Parallel Approach for 3D Modelling Current approach to parallelize FIRESTAR 3D Selection and OpenMP-parallelization of the main time-consuming routines: 1) iterative CG solvers& calculation of turbulent quantities ~80%CPU time (in serial execution). 2) routines for transport equations (velocity, temperature) and pressure correstion ~20% (in serial execution). 3) initialization - just assignment in a parallel DO loop that corresponds to computational parallel DO loops. 4) some serial optimizations and transformations of the code (in order to avoid dependencies and side-effects between threads). !$OMP DO do K=0,Nz+1 do J=0,Ny+1 do I=0,Nx+1 Wo2(I,J,K)=0. Wo3(I,J,K)=0. enddo enddo enddo !$OMP END DO

  11. PaCT-2007:Optimized Parallel Approach for 3D Modelling Parallelization results for the benchmark problem 60x60x60 Speed-up is good !(problem size 170 MB) 2 processors: limited by the throughput of a local memory (which is common for 2 CPUs) 4, 8 processors: superlinear speed-up (owing to the help of a large 4 Mbyte L3 cache in every CPU) 16 processors: negative effects (not divisible by 16, i.e. load disbalance; too small problem, i.e. influence of big boundaries)

  12. PaCT-2007:Optimized Parallel Approach for 3D Modelling Parallelization results: ”airflow canopy” problem 96x96x81 Speed-up is reasonable (problem size 1 GB) 2 processors: limited by the throughput of a local memory (which is common for 2 CPUs) 4, 8 processors: no superlinear speed-up (bigger problem !) 16 processors: negative effects(loaddisbalance etc.) are partly compensated by positive effects of a large L3-cache

  13. PaCT-2007:Optimized Parallel Approach for 3D Modelling Parallelization of radiative transfer (input data parallelism) (this work was done in collaboration with INRA-URFM-PIF team) Full sphere is split into parts (sectors) corresponding to the number of processors; Equations are integrated independently in each sector (for the full domain) – i.e. each processor computes its own set of input data; After data from each sector are distributed to subdomains for further processing with geometric parallelism.

  14. PaCT-2007:Optimized Parallel Approach for 3D Modelling Conclusion • In this word we developed: • strategy of OpenMP parallelization for NuMA computers • parallelization method for 3D CFD fire modelling code • This new method achieves good parallelization efficiency for moderate number of processors (up to 16). • Further work: acceleration of algebraic solvers, develop-ment and parallelization of implicit-class preconditioners. • Acnowledgements • This work was supported by the European integrated fire management project (Fire Paradox) and by the Russian Foundation for Basic Research (project # 05-08-18110). Acknowledgemens PaCT-2007, September 2007 Pereslavl-Zalessky, Russia

More Related