1 / 10

CS 267: Applications of Parallel Computers Final Project Suggestions

CS 267: Applications of Parallel Computers Final Project Suggestions. James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06. Outline. Kinds of projects Evaluating and improving the performance of a parallel application “Application” could be full scientific application, or important kernel

palmer
Download Presentation

CS 267: Applications of Parallel Computers Final Project Suggestions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 267: Applications of Parallel ComputersFinal Project Suggestions James Demmel www.cs.berkeley.edu/~demmel/cs267_Spr06 CS267 Lecture 22a

  2. Outline • Kinds of projects • Evaluating and improving the performance of a parallel application • “Application” could be full scientific application, or important kernel • Parallelizing a sequential application • other kinds of performance improvements possible too, eg memory hierarchy tuning • Devise a new parallel algorithm for some problem • Porting parallel application or systems software to new architecture • Example of previous projects (all on-line) • Upcoming guest lecturers • See their previous lectures, or contact them, for project ideas • Suggested projects CS267 Lecture 22a

  3. CS267 Class Projects from 2004 • BLAST Implementation on BEE2 — Chen Chang • PFLAMELET; An Unsteady Flamelet Solver for Parallel Computers — Fabrizio Bisetti • Parallel Pattern Matcher — Frank Gennari, Shariq Rizvi, and Guille Díez-Cañas • Parallel Simulation in Metropolis — Guang Yang • A Survey of Performance Optimizations for Titanium Immersed Boundary Simulation — Hormozd Gahvari, Omair Kamil, Benjamin Lee, Meling Ngo, and Armando Solar • Parallelization of oopd1 — Jeff Hammel • Optimization and Evaluation of a Titanium Adaptive Mesh Refinement Code — Amir Kamil, Ben Schwarz, and Jimmy Su CS267 Lecture 22a

  4. CS267 Class Projects from 2004 (cont) • Communication Savings With Ghost Cell Expansion For Domain Decompositions Of Finite Difference Grids — C. Zambrana Rojas and Mark Hoemmen • Parallelization of Phylogenetic Tree Construction — Michael Tung • UPC Implementation of the Sparse Triangular Solve and NAS FT — Christian Bell and Rajesh Nishtala • Widescale Load Balanced Shared Memory Model for Parallel Computing — Sonesh Surana, Yatish Patel, and Dan Adkins CS267 Lecture 22a

  5. Planned Guest Lecturers • Katherine Yelick (UPC, heart modeling) • David Anderson (volunteer computing) • Kimmen Sjolander (phylogenetic analysis of proteins – SATCHMO – Bonnie Kirkpatrick) • Julian Borrill, (astrophysical data analysis) • Wes Bethel, (graphics and data visualization) • Phil Colella, (adaptive mesh refinement) • David Skinner, (tools for scaling up applications) • Xiaoye Li, (sparse linear algebra) • Osni Marques and Tony Drummond, (ACTS Toolkit) • Andrew Canning (computational neuroscience) • Michael Wehner (climate modeling) CS267 Lecture 22a

  6. Suggested projects (1) • Weekly research group meetings on these and related topics (see J. Demmel and K. Yelick) • Contribute to upcoming ScaLAPACK release (JD) • Proposal, talk at www.cs.berkeley.edu/~demmel; ask me for latest • Performance evaluation of existing parallel algorithms • Ex: New eigensolvers based on successive band reduction • Improved implementations of existing parallel algorithms • Ex: Use UPC to overlap communication, computation • Many serial algorithms to be parallelized • See following slides CS267 Lecture 22a

  7. Missing Drivers in Sca/LAPACK CS267 Lecture 22a

  8. More missing drivers CS267 Lecture 22a

  9. Suggested projects (2) • Contribute to sparse linear algebra (JD & KY) • Performance tuning to minimize latency and bandwidth costs, both to memory and between processors (sparse => few flops per memory reference or word communicated) • Typical methods (eg CG = conjugate gradient) do some number of dot projects, saxpys for each SpMV, so communication cost is O(# iterations) • Our goal: Make latency cost O(1)! • Requires reorganizing algorithms drastically, including replacing SpMV by new kernel [Ax, A2x, A3x, … , Akx], which can be done with O(1) messages • Projects • Study scalability bottlenecks of current CG on real, large matrices • Optimize [Ax, A2x, A3x, … , Akx] on sequential machines • Optimize [Ax, A2x, A3x, … , Akx] on parallel machines CS267 Lecture 22a

  10. Suggested projects (3) • Evaluate new languages on applications (KY) • UPC or Titanium • UPC for asynchrony, overlapping communication & computation • ScaLAPACK in UPC • Use UPC-based 3D FFT in your application • Optimize existing 1D FFT in UPC, to use 3D techniques • Porting, Evaluating parallel systems software (KY) • Port UPC to RAMP • Port GASNET to Blue Gene, evaluate performance CS267 Lecture 22a

More Related