1 / 20

Outline

CS4960: Parallel Programming Guest Lecture: Parallel Programming for Scientific Computing Mary Hall September 22, 2008. Outline. Introduction The fastest computer in the world today Large-scale scientific simulations Why writing fast parallel programs is hard

edendy
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4960:Parallel ProgrammingGuest Lecture: Parallel Programming for Scientific ComputingMary HallSeptember 22, 2008 CS4960

  2. Outline • Introduction • The fastest computer in the world today • Large-scale scientific simulations • Why writing fast parallel programs is hard • New parallel programming languages Material for this lecture provided by: Kathy Yelick and Jim Demmel, UC Berkeley Brad Chamberlain, Cray CS4960

  3. My Research Area Parallel and Distributed Computing • Limited to supercomputers? • No! Everywhere! • Scientific applications? • These are still important, but also many new commercial applications and new consumer applications are going to emerge. • Programming tools adequate and established? • No! Many newresearch challenges CS4960

  4. Why is This Course Important? Why now? • We are seeing a convergence of high-end, conventional and embedded computing • On-chip architectures look like parallel computers • Languages, software development and compilation strategies originally developed for high end (supercomputers) are now becoming important for many other domains • Why? • Technology trends • Looking to the future • Parallel computing for the masses demands better parallel programming paradigms • And more people who are trained in writing parallel programs (you!) • How to put all these vast machine resources to the best use! CS4960

  5. The fastest computer in the world today RoadRunner Los Alamos National Laboratory 18,802 processor chips (~123,284 “processors”) AMD Opterons and IBM Cell/BE (in Playstations) 1.026 Petaflop/second One quadrilion operations/s 1 x 1016 • What is its name? • Where is it located? • How many processors does it have? • What kind of processors? • How fast is it? See http://www.top500.org CS4960

  6. Scientific Simulation: The Third Pillar of Science • Traditional scientific and engineering paradigm: • Do theory or paper design. • Perform experiments or build system. • Limitations: • Too difficult -- build large wind tunnels. • Too expensive -- build a throw-away passenger jet. • Too slow -- wait for climate or galactic evolution. • Too dangerous -- weapons, drug design, climate experimentation. • Computational science paradigm: • Use high performance computer systems to simulate the phenomenon • Base on known physical laws and efficient numerical methods. CS4960

  7. Some Particularly Challenging Computations • Science • Global climate modeling • Biology: genomics; protein folding; drug design • Astrophysical modeling • Computational Chemistry • Computational Material Sciences and Nanosciences • Engineering • Semiconductor design • Earthquake and structural modeling • Computation fluid dynamics (airplane design) • Combustion (engine design) • Crash simulation • Business • Financial and economic modeling • Transaction processing, web services and search engines • Defense • Nuclear weapons -- test by simulations • Cryptography CS4960

  8. Example: Global Climate Modeling Problem • Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity • Approach: • Discretize the domain, e.g., a measurement point every 10 km • Devise an algorithm to predict weather at time t+dt given t • Uses: • Predict major events, e.g., El Nino • Use in setting air emissions standards Source: http://www.epm.ornl.gov/chammp/chammp.html CS4960

  9. High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL

  10. Some Characteristics of Scientific Simulation • Discretize physical or conceptual space into a grid • Simpler if regular, may be more representative if adaptive • Perform local computations on grid • Given yesterday’s temperature and weather pattern, what is today’s expected temperature? • Communicate partial results between grids • Contribute local weather result to understand global weather pattern. • Repeat for a set of time steps • Possibly perform other calculations with results • Given weather model, what area should evacuate for a hurricane? CS4960

  11. More Examples: Parallel Computing in Data Analysis • Finding information amidst large quantities of data • General themes of sifting through large, unstructured data sets: • Has there been an outbreak of some medical condition in a community? • Which doctors are most likely involved in fraudulent charging to medicare? • When should white socks go on sale? • What advertisements should be sent to you? • Data collected and stored at enormous speeds (Gbyte/hour) • remote sensor on a satellite • telescope scanning the skies • microarrays generating gene expression data • scientific simulations generating terabytes of data • NSA analysis of telecommunications CS4960

  12. Why writing (fast) parallel programs is hard CS4960

  13. Parallel Programming Complexity An Analogy to Preparing Thanksgiving Dinner • Enough parallelism? (Amdahl’s Law) • Suppose you want to just serve turkey • Granularity • How frequently must each assistant report to the chef • After each stroke of a knife? Each step of a recipe? Each dish completed? • Locality • Grab the spices one at a time? Or collect ones that are needed prior to starting a dish? • Load balance • Each assistant gets a dish? Preparing stuffing vs. cooking green beans? • Coordination and Synchronization • Person chopping onions for stuffing can also supply green beans • Start pie after turkey is out of the oven All of these things makes parallel programming even harder than sequential programming. CS4960

  14. Finding Enough Parallelism • Suppose only part of an application seems parallel • Amdahl’s law • let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable • P = number of processors Speedup(P) = Time(1)/Time(P) <= 1/(s + (1-s)/P) <= 1/s • Even if the parallel part speeds up perfectly performance is limited by the sequential part CS4960

  15. Overhead of Parallelism • Given enough parallel work, this is the biggest barrier to getting desired speedup • Parallelism overheads include: • cost of starting a thread or process • cost of communicating shared data • cost of synchronizing • extra (redundant) computation • Each of these can be in the range of milliseconds (=millions of flops) on some systems • Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work CS4960

  16. Locality and Parallelism Conventional Storage Hierarchy • Large memories are slow, fast memories are small • Storage hierarchies are large and fast on average • Parallel processors, collectively, have large, fast cache • the slow accesses to “remote” data we call “communication” • Algorithm should do most work on local data Proc Proc Proc Cache Cache Cache L2 Cache L2 Cache L2 Cache L3 Cache L3 Cache L3 Cache potential interconnects Memory Memory Memory CS4960

  17. Load Imbalance • Load imbalance is the time that some processors in the system are idle due to • insufficient parallelism (during that phase) • unequal size tasks • Examples of the latter • adapting to “interesting parts of a domain” • tree-structured computations • fundamentally unstructured problems • Algorithm needs to balance load CS4960

  18. New Parallel Programming Languages for Scientific Computing CS4960

  19. A Brief Look at Chapel (Cray) • History: • Starting in 2002, the Defense Advanced Research Projects Agency (DARPA) funded 5 industry players to investigate the new revolutionary, commercially-viable high-end computer system of 2010 • Now, two teams (IBM and Cray) are still building systems to be deployed next year • Both have introduced new languages, along with Sun • Chapel (Cray) • X10 (IBM) • Fortress (Sun) • We will look at Chapel for the next few slides CS4960

  20. Summary of Lecture • Scientific simulation discretizes some space into a grid • Perform local computations on grid • Communicate partial results between grids • Repeat for a set of time steps • Possibly perform other calculations with results • Writing fast parallel programs is difficult • Amdahl’s Law Must parallelize most of computation • Data Locality • Communication and Synchronization • Load Imbalance • Challenge for new productive parallel programming languages • Express data partitioning and parallelism at a high leve • Still obtain high performance! CS4960

More Related