1 / 19

Introduction to Research 2011

Introduction to Research 2011. Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva. Part of the machine room at ORNL. The Cell processor powers the Roadrunner at LANL. NVIDIA GPUs power Tianhe-1A in China. Images from ORNL, IBM, NVIDIA. Outline. Research

aysel
Download Presentation

Introduction to Research 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Research 2011 Ashok Srinivasan Florida State University www.cs.fsu.edu/~asriniva Part of the machine room at ORNL The Cell processor powers the Roadrunner at LANL NVIDIA GPUs power Tianhe-1A in China Images from ORNL, IBM, NVIDIA

  2. Outline • Research High Performance Computing  Applications and Software • Multicore processors • Massively parallel processors • Computational nanotechnology • Simulation-based policy making • Potential Research Topics

  3. Research Areas • High Performance Computing, Applications in Computational Sciences, Scalable Algorithms, Mathematical Software • Current topics: Computational Nanotechnology, HPC on Multicore Processors, Massively Parallel Applications • New Topics: Simulation-based policy analysis • Old Topics: Computational Finance, Parallel Random Number Generation, Monte Carlo Linear Algebra, Computational Fluid Dynamics, Image Compression

  4. Importance of Supercomputing • Fundamental scientific understanding • Nano-materials, drug design • Solution of bigger problems • Climate modeling • More accurate solutions • Automobile crash tests • Solutions with time constraints • Disaster mitigation • Study of complex interactions for policy decisions • Urban planning

  5. Some Applications • Increasing relevance to industry • In 1993, fewer than 30% of top 500 supercomputers were commercial, now, 57% are commercial • A variety of application areas Commercial • Finance and insurance • Medicine • Aerospace and Automobiles • Telecom • Oil exploration • Shoes! (Nike) • Potato chips! • Toys! Scientific • Weather prediction • Earthquake modeling • Epidemic modeling • Materials • Energy • Computational biology • Astro-physics

  6. Supercomputing Power The amount of parallelism too is increasing, with the high end having over 200,000 cores

  7. Geographic Distribution • North America has over half the top 500 systems • However, Europe and East Asia too have a significant share • China is determined to be a supercomputing superpower • Two of its national supercomputing centers have top-five supercomputers • Japan has the top machine and two in the top five • Planning a $ 1.3 billion exascale supercomputer in 2020

  8. Asian Supercomputing Trends

  9. Challenges in Supercomputing • Hardware can be obtained with enough money • But obtaining good performance on large systems is difficult • Some DOE applications ran at 1% efficiency on 10,000 cores • They will have to deal with a million threads soon, and with a billion at the exa-scale • Don’t think of supercomputing as a means of solving current problems faster, but as a means of solving problems we earlier thought we could not solve • Development of software tools to make use of the machines easier

  10. Architectural Trends • Massive parallelism • 10K processor systems will be commonplace • Large end already has over 500K processors • Single chip multiprocessing • All processors will be multicore • Heterogeneous multicore processors • Cell used in the PS3 • GPGPU • 80-core processor from Intel • Processors with hundreds of cores are already commercially available • Distributed environments, such as the Grid • But it is hard to get good performance on these systems

  11. Accelerating Applications with GPUs • Over a hundred cores per GPU • Hide memory latency with thousands of threads • Can accelerate a traditional computer to a teraflop • GPU cluster at FSU • Quantum Monte Carlo applications • Algorithms • Linear algebra, FFT, compression, etc

  12. Small Discrete Fourier Transforms (DFT) on GPUs • GPUs are effective for large DFTs, but not small DFTs • However, they can be effective for a large number of small DFTs • Useful for AFQMC • We use the asymptotically slow matrix-multiplication based DFT for very small sizes • We combine it with mixed-radix for larger sizes • We use asynchronous memory transfer to deal with host-device data transfer overhead

  13. Comparison of DFT Performance Comparison of 512 simultaneous DFTs without host-device data transfer 3-D DFTs 2-D DFTs

  14. Petascale Quantum Monte Carlo • Originally a DOE funded project involving collaboration between ORNL, UIUC, Cornell, UTK, CWM, and NCSU • Now funded by ORAU/ORNL • Scale Quantum Monte Carlo applications to petascale (one million gigaflops) machines • Load balancing, fault tolerance, other optimizations

  15. Load Balancing • In current implementations, such as QWalk and QMCPack, cores send excess walkers to cores with fewer walkers • In the new algorithm (alias method), cores may send more than their excess, and receive walkers even if they originally had an excess • Load can be balanced with each core receiving from at most one other core • Also optimal in maximum number of walkers received • Total number of walkers sent may be twice the optimal

  16. Performance Comparison Mean number of walkers migrated Comparisons with QWalk Maximum number of receives

  17. Process-Node Affinity Node allocation is not necessarily ideal for minimizing communication Process-node affinity can, therefore, be important Allocated nodes for a 12,000 core run on Jaguar

  18. Load Balancing with Affinity Renumbering the nodes improves load balancing and AllGather time Basic load balancing Load balancing after renumbering Results on Jaguar

  19. Potential Research Topics • High Performance Computing on Multicore Processors • Algorithms, Applications, and libraries on GPUs • Applications on Massively Parallel Processors • Quantum Monte Carlo applications • Load balancing and communication optimizations • Simulation-based policy decisions • Combine scientific computing with models of social interactions to help make policy decisions

More Related