1 / 1

After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rende

P. Actual partition. proc = row. For a balanced partitioning, one could assume:. N. 3. 4. 1. 2. Assumed partition. 3. 4. 1. 2. Rows owned by 1, In 2’s assumed partition. A 2. A P. A 1. A =. Scalable Conceptual Interfaces in hypre. Allison Baker

arleen
Download Presentation

After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rende

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P Actual partition proc = row For a balanced partitioning, one could assume: N 3 4 1 2 Assumed partition 3 4 1 2 Rows owned by 1, In 2’s assumed partition A2 AP A1 A = Scalable Conceptual Interfaces in hypre Allison Baker Center for Applied Scientific Computing, Lawrence Livermore National Laboratory (joint work with Rob Falgout, Jim Jones, and Ulrike Meier Yang) Overview New Algorithm What is hypre? Goal: Generate neighborhood information in a scalable manner for large numbers of processors (P) • A library of high-performance algorithms for solving large, sparse systems of linear equations on massively parallel computers How? Assume the global partition! Scalability: The key issue for large-scale computing • The new algorithm is a kind of rendezvous algorithm that uses the concept of an assumed partition to answer queries about the global data distribution • Good performance requires scalable algorithms and software • Assumed Partition algorithm: • Assume a global partition of data (N rows) that may be queried by any processor (with O(1) computation and storage cost) • Reconcile assumed rows with actual rows – contact processors regarding rows owned in another’s assumed partition • Use the assumed partition to determine send and receive processors Parallel computing data is in distributed form • Conceptual interface gets data from application code to hypre • Each processor knows only its own piece of the linear system • Problem: Solvers require “nearby” data from other processors and the interfaces must determine who owns this data efficiently Goal: Scalable interfaces to solvers! • LLNL’s new BlueGene/L machine > 100,000 processors! After step 2, processors know who owns the data in their assumed partitions— now the assumed partition defines the rendezvous points IJ Conceptual Interface Description • This assumed partition concept is applicable to all of hypre’s conceptual interfaces and a variety of situations in parallel codes • hypre’s traditional linear-algebraic interface • Matrix and right-hand side are defined in terms of row and column indices New algorithm costs for P processors Matrices are distributed across P processors by contiguous blocks of rows Results • Matrix-vector multiply requires some knowledge of the global partition Comparison of the new assumed partition algorithm and the old algorithm for a 3D Laplacian operator with a 27-point stencil Old method for determining neighborhood info • Each processor owns ~64,000 rows • Runs on LLNL’s MCR Linux cluster • New algorithm has better scaling properties—This will be important for 100,000 processors! • Each processor sends its range to all other processors • All processors store the global partition and use it to determine who to receive data from (receive processors) • Processors discover who to send data to (send processors) via a second communication Old method costs for P processors Not good enough! As P increases, the algorithm’s |cost increases! What’s Next? • Testing on more processors using BlueGene/L—16K processors coming soon! • Adapting the assumed partition to the hypre conceptual interface for structured problems—more complicated! For scalability, the computation, communications and storage costs should all depend on P logarithmically or better! This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. UCRL-POST-209333

More Related