Parallelism in High-Performance Computing Applications

Parallelism in High-Performance Computing Applications • Exploit parallelism through the entire simulation/computation pipeline from I/O to visualization. • Current approaches have taken isolated approaches to parallel applications, data archival, retrieval, analysis, and visualization. • In addition to our work on parallel computing, we have also investigated topics in parallel/distributed visualization, data analysis, and compression.

Scalable Parallel Volume Visualization • Highly optimized shear-warp algorithm forms the basis for parallelization. • Optimizations include image and object space coherence, early termination, compression. • Parallel (MPI-based) formulation on SP is shown to scale to 128 processors and achieve frame rates in excess of 15 fps for UNC Brain dataset (256x256x167).

Parallel Shear-Warp • Data Partitioning: • Sheared volume partitioning • Compositing: • Software compositing/binary aggregation • Load Balancing: • Coherence in object movement -- use previous frame to load balance current frame.

Data/Computation Partitioning

Performance Notes • Only scan-lines corresponding to incremental shear need to be communicated between frames. • Since relative shear is not large, this communication overhead is small.

Performance Notes • MPI version tested on up to 128 processors of an IBM SP (112MHz PowerPC 604), among other platforms. • Datasets scaling from 128 x 128 x 84 to 256 x 256 x 167 (UNC Brain/Head datasets).

Performance Notes. All rendering times are in milliseconds and include compositing time.

Data Analysis Techniques for Very High Dimensional Data • Datasets from simulations/physical processes can have extremely high dimensionality and large volume. • This data is also typically sparse. • Interpreting this data requires scalable techniques for detection of dominant and deviant patterns. • Handling large discrete-valued datasets • Extracting co-occurrences between events • Summarizing data in an error-bounded fashion • Finding concise representations for summary data

Background • Singular Value Decomposition (SVD) [Berry et.al., 1995] • Decompose matrix into A=USVT • U and V orthogonal matrices, Sdiagonal with singular values • Used for Latent Semantic Indexing in Information Retrieval • Truncate decomposition to compress data

Background • Semi-Discrete Decomposition (SDD) [Kolda and O’Leary, 1998] • Restrict entries of U and V to {-1,0,1} • Requires very small amount of storage • Can perform as well as SVD in LSI using less than one-tenth the storage • Effective in finding outlier clusters • works well for datasets containing a large number of small clusters

Rank-1 Approximations x : presence vector y : pattern vector

Problem:Given discrete matrix Amxn , find discrete vectors xmx1 and ynx1 to Minimize = number of non-zeros in the error matrix solve for x to Maximize Heuristic: Fix y, set Discrete Rank-1 Approximation Iteratively solve for x and y until no improvement possible

- At any step, given rank-one approximation AxyT, split A into A1and A0 based on rows: - if x(i)=0 row i goes to A0 - Stop when - Hamming radius of A1, maximum of the Hamming distances of A1pattern vector, is less then some threshold - All rows of A are present in A1 (if A1does not satisfy Hamming radius condition, can split A1 based on Hamming distances) Recursive Algorithm - if x(i)=1 row i goes to A1

Effectiveness of Analysis

runtime vs # columns runtime vs # rows runtime vs # nonzeros Run-time Scalability • Rank-1 approximation requires O(nz(A)) time • Total run-time at each level in the recursive tree cannot exceed • this since total # of nonzeros at each level is at most nz(A) • Run-time is linear in nz(A)

Parallelism in High-Performance Computing Applications

Parallelism in High-Performance Computing Applications

Presentation Transcript

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

PFunc: Modern Task Parallelism For Modern High Performance Computing

High Performance Computing Discussion of Student Applications

ME964 High Performance Computing for Engineering Applications

High-Performance Computing An Applications Perspective

High Performance Computing Applications in Biology

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications

ME964 High Performance Computing for Engineering Applications