200 likes | 216 Views
Explore the importance of parallel programming in scientific computing, challenges faced in writing fast parallel programs, and the latest trends and advancements in the field. Learn about the fastest computer today, large-scale scientific simulations, and the need for better parallel programming paradigms.
E N D
CS4960:Parallel ProgrammingGuest Lecture: Parallel Programming for Scientific ComputingMary HallSeptember 22, 2008 CS4960
Outline • Introduction • The fastest computer in the world today • Large-scale scientific simulations • Why writing fast parallel programs is hard • New parallel programming languages Material for this lecture provided by: Kathy Yelick and Jim Demmel, UC Berkeley Brad Chamberlain, Cray CS4960
My Research Area Parallel and Distributed Computing • Limited to supercomputers? • No! Everywhere! • Scientific applications? • These are still important, but also many new commercial applications and new consumer applications are going to emerge. • Programming tools adequate and established? • No! Many newresearch challenges CS4960
Why is This Course Important? Why now? • We are seeing a convergence of high-end, conventional and embedded computing • On-chip architectures look like parallel computers • Languages, software development and compilation strategies originally developed for high end (supercomputers) are now becoming important for many other domains • Why? • Technology trends • Looking to the future • Parallel computing for the masses demands better parallel programming paradigms • And more people who are trained in writing parallel programs (you!) • How to put all these vast machine resources to the best use! CS4960
The fastest computer in the world today RoadRunner Los Alamos National Laboratory 18,802 processor chips (~123,284 “processors”) AMD Opterons and IBM Cell/BE (in Playstations) 1.026 Petaflop/second One quadrilion operations/s 1 x 1016 • What is its name? • Where is it located? • How many processors does it have? • What kind of processors? • How fast is it? See http://www.top500.org CS4960
Scientific Simulation: The Third Pillar of Science • Traditional scientific and engineering paradigm: • Do theory or paper design. • Perform experiments or build system. • Limitations: • Too difficult -- build large wind tunnels. • Too expensive -- build a throw-away passenger jet. • Too slow -- wait for climate or galactic evolution. • Too dangerous -- weapons, drug design, climate experimentation. • Computational science paradigm: • Use high performance computer systems to simulate the phenomenon • Base on known physical laws and efficient numerical methods. CS4960
Some Particularly Challenging Computations • Science • Global climate modeling • Biology: genomics; protein folding; drug design • Astrophysical modeling • Computational Chemistry • Computational Material Sciences and Nanosciences • Engineering • Semiconductor design • Earthquake and structural modeling • Computation fluid dynamics (airplane design) • Combustion (engine design) • Crash simulation • Business • Financial and economic modeling • Transaction processing, web services and search engines • Defense • Nuclear weapons -- test by simulations • Cryptography CS4960
Example: Global Climate Modeling Problem • Problem is to compute: f(latitude, longitude, elevation, time) temperature, pressure, humidity, wind velocity • Approach: • Discretize the domain, e.g., a measurement point every 10 km • Devise an algorithm to predict weather at time t+dt given t • Uses: • Predict major events, e.g., El Nino • Use in setting air emissions standards Source: http://www.epm.ornl.gov/chammp/chammp.html CS4960
High Resolution Climate Modeling on NERSC-3 – P. Duffy, et al., LLNL
Some Characteristics of Scientific Simulation • Discretize physical or conceptual space into a grid • Simpler if regular, may be more representative if adaptive • Perform local computations on grid • Given yesterday’s temperature and weather pattern, what is today’s expected temperature? • Communicate partial results between grids • Contribute local weather result to understand global weather pattern. • Repeat for a set of time steps • Possibly perform other calculations with results • Given weather model, what area should evacuate for a hurricane? CS4960
More Examples: Parallel Computing in Data Analysis • Finding information amidst large quantities of data • General themes of sifting through large, unstructured data sets: • Has there been an outbreak of some medical condition in a community? • Which doctors are most likely involved in fraudulent charging to medicare? • When should white socks go on sale? • What advertisements should be sent to you? • Data collected and stored at enormous speeds (Gbyte/hour) • remote sensor on a satellite • telescope scanning the skies • microarrays generating gene expression data • scientific simulations generating terabytes of data • NSA analysis of telecommunications CS4960
Parallel Programming Complexity An Analogy to Preparing Thanksgiving Dinner • Enough parallelism? (Amdahl’s Law) • Suppose you want to just serve turkey • Granularity • How frequently must each assistant report to the chef • After each stroke of a knife? Each step of a recipe? Each dish completed? • Locality • Grab the spices one at a time? Or collect ones that are needed prior to starting a dish? • Load balance • Each assistant gets a dish? Preparing stuffing vs. cooking green beans? • Coordination and Synchronization • Person chopping onions for stuffing can also supply green beans • Start pie after turkey is out of the oven All of these things makes parallel programming even harder than sequential programming. CS4960
Finding Enough Parallelism • Suppose only part of an application seems parallel • Amdahl’s law • let s be the fraction of work done sequentially, so (1-s) is fraction parallelizable • P = number of processors Speedup(P) = Time(1)/Time(P) <= 1/(s + (1-s)/P) <= 1/s • Even if the parallel part speeds up perfectly performance is limited by the sequential part CS4960
Overhead of Parallelism • Given enough parallel work, this is the biggest barrier to getting desired speedup • Parallelism overheads include: • cost of starting a thread or process • cost of communicating shared data • cost of synchronizing • extra (redundant) computation • Each of these can be in the range of milliseconds (=millions of flops) on some systems • Tradeoff: Algorithm needs sufficiently large units of work to run fast in parallel (I.e. large granularity), but not so large that there is not enough parallel work CS4960
Locality and Parallelism Conventional Storage Hierarchy • Large memories are slow, fast memories are small • Storage hierarchies are large and fast on average • Parallel processors, collectively, have large, fast cache • the slow accesses to “remote” data we call “communication” • Algorithm should do most work on local data Proc Proc Proc Cache Cache Cache L2 Cache L2 Cache L2 Cache L3 Cache L3 Cache L3 Cache potential interconnects Memory Memory Memory CS4960
Load Imbalance • Load imbalance is the time that some processors in the system are idle due to • insufficient parallelism (during that phase) • unequal size tasks • Examples of the latter • adapting to “interesting parts of a domain” • tree-structured computations • fundamentally unstructured problems • Algorithm needs to balance load CS4960
New Parallel Programming Languages for Scientific Computing CS4960
A Brief Look at Chapel (Cray) • History: • Starting in 2002, the Defense Advanced Research Projects Agency (DARPA) funded 5 industry players to investigate the new revolutionary, commercially-viable high-end computer system of 2010 • Now, two teams (IBM and Cray) are still building systems to be deployed next year • Both have introduced new languages, along with Sun • Chapel (Cray) • X10 (IBM) • Fortress (Sun) • We will look at Chapel for the next few slides CS4960
Summary of Lecture • Scientific simulation discretizes some space into a grid • Perform local computations on grid • Communicate partial results between grids • Repeat for a set of time steps • Possibly perform other calculations with results • Writing fast parallel programs is difficult • Amdahl’s Law Must parallelize most of computation • Data Locality • Communication and Synchronization • Load Imbalance • Challenge for new productive parallel programming languages • Express data partitioning and parallelism at a high leve • Still obtain high performance! CS4960