1 / 48

Richard Vondráček Department of Structural Mechanics Faculty of Civil Engineering

The Use of The Sparse Direct Solver in The Engineering Applications of The Finite Element Method Pou žití přímého řídkého řešiče v inženýrských aplikacích metody konečných prvků. Richard Vondráček Department of Structural Mechanics Faculty of Civil Engineering

mahsa
Download Presentation

Richard Vondráček Department of Structural Mechanics Faculty of Civil Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Use of The Sparse Direct Solver in The Engineering Applications of The Finite Element MethodPoužití přímého řídkého řešiče v inženýrských aplikacích metody konečných prvků Richard Vondráček Department of Structural Mechanics Faculty of Civil Engineering Czech Technical University in Prague

  2. Outline • Solutions of SLE • Direct sparse solver • Matrix storage formats • Ordering algorithms in general • Block storage • Symbolical factorization • Sparse numerical factorization • Preliminary results • Future goals

  3. Methods for the solution of the sparse system of the linear equations • Iterative • Jacobi, Gauss-Seidel, SOR • CG, GMRES • Multigrid • Direct • Factorization type • LU, LDLT, LLT ( left-looking / right-looking ) • QR • Algorithm (matrix storage scheme) • Skyline (L1) • Frontal (L3) • Multifrontal (L3) • Sparse (L0..L3) • Combined • Incomplete factorizations ( preconditioned CG ) • Domain decompositions - FETI, DP-FETI

  4. Sparse direct solver motivation • Fast factorization • Robust implementation • Unsymmetrical, indefinite, poorly conditioned problems • Lower memory consumption • Enable factorization of bigger tasks • Practical with existing FEM solvers • Compatible with domain decomposition algorithms • Freely available source and documentation

  5. Sparse direct solver • Sparse LDLT, LLT, LU decomposition • Compressed columns storage • Block storage scheme • less indices • efficiency in floating point operations • locality of data is good for memory access caching • Symbolic factorization • AMD ordering (T.A.Davis, P.Amestoy and I.S.Duff) • Minimizes directly the fill-in • Quotient graph • Pattern of matrix L

  6. Matrix ordering algorithms • Profile minimizing • Cuthill-McKee • Sloan • Spectral envelope reduction • Fill-in minimizing • Minimum degree (local → global) • AMD • MMD • Nested Disection (global → local) • Recursive Graph Bisection • Recursive Coordinate Bisection • Recursive Spectral Bisection • Multisection

  7. Matrix storage schemes - orderings Example 2D plane stress mechanical problem Sparse stiffness matrix

  8. Matrix storage schemes - orderings Original ordering + fill-in skyline Original ordering + fill-in sparse direct

  9. Matrix storage schemes - orderings Reversed Cuthill-McKee skyline AMD + fill-in sparse direct

  10. Matrix storage schemes - orderings Recursive Graph Bisection sparse direct Principle of the bisection

  11. Block sparsematrix storage scheme The size of the blocks corresponds with the number of DOFs ofmesh node. The size depends on the physics of the underlying problem. • less indices • efficiency in floating point operations • locality of data is good for memory access caching • performance is tradeoff ( fast arith. vs. wasted arith. ) b×n 2×2, 3×3,.. ..,b×b

  12. Block sparsematrix storage scheme Care must be taken about how the individual equations of the sparse system fit into the regular block structure. b×n Equations eliminated due to the DBC should not destroy the block structure !!

  13. Block sparsematrix storage scheme u1a u2a pa k11ab k12ab c1ab u1b Indefinite systems which have regular “indefinite contributions” from each mesh node can be a-priory preordered to ensure the numerical stability of the decomposition k21ab k22ab c2ab u2b c1ab c2ab 0 pb b

  14. Sparse solution strategy clear matrix L select DOFs permutation and blocksize create coarse connectivity matrix load nonzero matrix A entries into prepared matrix L symbolical factorization finds the block ordering and L pattern numerical factorization forward / backward substitutions for different RHSs allocate memory for the block sparse matrix L

  15. Symbolical factorization • Based on a modified Approximate Minimum Degree algorithm • AMD ordering (T.A.Davis, P.Amestoy and I.S.Duff) (colamd) • Principle of the minimum degree • Finds the optimal equations preordering • Determines the final memory requirements • Gives us the structure of the factored matrix L

  16. Symbolical factorization Connectivity graph used for symbolic factorization Original graph Elimination graph Quotient graph eliminate

  17. Approximate minimum degree Quotient graph in k-th step of factorization

  18. Approximate minimum degree Important sets True external degree (minimum degree)

  19. Numerical factorization • Left-looking decomposition • Three embeded “for” loops • Indirect addressing in the inner loop due to the compressed column format • No further memory allocation • Preoptimized block multiplications • Use of temporary index maps

  20. Numerical factorization Order in which the left-looking factorization updates the matrix entries in the two outer loops

  21. Numerical factorization The purpose of the factorization inner loop in a dense matrix

  22. Numerical factorization Computational effort of the skyline solver in the inner loop

  23. Numerical factorization Computational effort of the sparse direct solver in the inner loop

  24. Numerical experiments • Regular domains and structured meshes • General plate problem (Mindlin theory) • General threedimensional problem

  25. Numerical experiments Parameters of used cluster

  26. Regular domains and structuredmeshes

  27. Regular domains and structuredmeshes Regular domains and structuredmeshes Memory demands of the Schur complement method

  28. Regular domains and structuredmeshes Regular domains and structuredmeshes Time demands of the Schur complement method in seconds

  29. Regular domains and structuredmeshes Regular domains and structuredmeshes Memory demands of the DP-FETI method

  30. Regular domains and structuredmeshes Regular domains and structuredmeshes Time requirements of the DP-FETI method in seconds

  31. General plate problem (Mindlin theory)

  32. General plate problem (Mindlin theory)

  33. General plate problem (Mindlin theory)

  34. General plate problem (Mindlin theory)

  35. General plate problem (Mindlin theory)

  36. General plate problem (Mindlin theory)

  37. General plate problem (Mindlin theory)

  38. General plate problem (Mindlin theory)

  39. General threedimensional problem

  40. Decomposition into subdomains

  41. General threedimensional problem

  42. General threedimensional problem

  43. General threedimensional problem

  44. General threedimensional problem

  45. General threedimensional problem General domain and unstructuredmeshes The Schur complement method

  46. Preliminary conclusions • Competitive with skyline already for small tasks (but not for all) • Clear winner for larger tasks • If the system matrix fits well into the blocks we obtain optimal performance • Always faster RHS solution (forw., back. substitution) than skyline • Can be used as IC preconditioner for CG • Lower memory consumption • Allows bigger tasks without pagefile swapping

  47. Future work and possible research • Study and find best practices for • block size influence to the performance • domain shape influence to the sparse direct vs. skyline effectivity • performance in nonlinear mechanical problems • performance in indefinite problems (incompressible media) • Try variable block size storage schema • Implement automatic block utilization algorithm • LU for primary domain decomposition • Implement block skyline and compare performance • ? Combined storage schema sparse + skyline ?

  48. Thank you !

More Related