1 / 30

Multicore / Manycore Processors

Joram Benham April 2, 2012. Multicore / Manycore Processors. Outline. Introduction Motivation Multicore Processors Overview, CELL Advantages of CMPs Throughput, Latency Challenges Future of Multicore. Introduction. Multicore processors Several/many cores on the same chip

lewis
Download Presentation

Multicore / Manycore Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. JoramBenham April 2, 2012 Multicore/Manycore Processors

  2. Outline • Introduction • Motivation • Multicore Processors • Overview, CELL • Advantages of CMPs • Throughput, Latency • Challenges • Future of Multicore

  3. Introduction • Multicore processors • Several/many cores on the same chip • Dual/quad core – two/four cores • AKA Chip-multiprocessors (CMPs)

  4. Motivation

  5. Motivation - ILP • Instruction-Level Parallelism • Pipelining – split execution into stages • Superscalar – issue multiple instruction each cycle • Out-of-order execution • Branch prediction • Take advantage of implicit program parallelism – instruction independence

  6. Motivation – ILP Problems • Limited amount of implicit parallelism in sequentially designed/coded programs • Circuitry for pipelining becomes complex after 10-20 stages • Power – circuitry for ILP exploitation results in exponentially more power being used

  7. Intel processor power over time. Power in Watts on y-axis, years on x-axis.

  8. Chip-Multiprocessors AKA Multicore/Manycore Processor

  9. CMPs • Getting harder to build better uniprocessors • CMPs are less difficult • Can reuse/modify old designs • Add modified copies to same chip • Requires a paradigm shift • From Von Neumann model to parallel programming model • Thread-level parallelism + instruction-level parallelism

  10. Basic Uniprocessor Design

  11. Basic CMP Design

  12. Real CMP Example - CELL • CELL CMP – heterogeneous • Developed by Sony, Toshiba, IBM • Built for Sony’s PlayStation 3 • Contains 9 cores • 1 Power Processing Element (PPE) • 8 Synergistic Processing Elements (SPEs)

  13. Advantages of CMPs Throughput, Latency

  14. Improving Throughput • Web-server throughput • Handle many independent service requests • Collections of uniprocessor servers used • Then, multiprocessor systems • CMP approach • Use less power for communication • Reducing clock-speeds

  15. Throughput – Servers • General rule: • “The simpler the pipeline, the lower the power.” • Simple cores – less power used • Less speed, but more cores available to handle requests

  16. Comparison of power usage by equivalent narrow issue/in-order processors, and wide-issue/out-of-order processors on throughput-oriented software.

  17. Throughput - Multithreading • Server applications: • High thread-level parallelism • Lower instruction-level parallelism, high cache miss rates • Results in idle processor time on uniprocessors • Hardware multithreading • Coarse-grained: stalls trigger switches • Fine-grained: switch threads continuously • Simultaneous: Run multiple threads using superscalar issuing

  18. Throughput – Increase the Cores • More cores = higher total hardware thread count • What kind of cores should be added? • Fewer larger, more complex cores • Individual threads complete faster • Many smaller, simpler cores • Slightly slower – but more cores means more threads, and higher throughput

  19. Improving Latency • Latency is more important in some programs • E.g. Desktop applications, compilation • CMPs are closer together on chip – less communication time • Two ways CMPs help with latency • Parallelize the code for responsive applications • Run sequential applications on their own hardware threads – no competition between threads

  20. Multicore Challenges Power and Temperature, Cache Coherence, Memory Access, Paradigm Shift, Starvation

  21. Power and Temperature • In theory: two cores on the same chip = twice as much power + lots of heat • Solutions: • Reduce core clock speeds • Implement a power control unit

  22. CELL chip-multiprocessor thermal diagram.

  23. Cache Coherence • Multiple cores, independent local caches • Load same block of main memory into cache – may result in data inconsistency • Cache coherence schemes • Snooping: Watch the communication bus • Directory-based: Keep track of which memory locations are being shared in multiple caches

  24. Memory Issues • We need more memory to share among multicore processors • 64-bit processors – helps address the issue: more addressable memory • Useless if we cannot access it quickly • Disk speed slows everyone down

  25. Change to Parallel Paradigm • “To use multicore, you really have to use multiple threads. If you know how to do it, it's not bad. But the first time you do it there are lots of ways to shoot yourself in the foot. The bugs you introduce with multithreading are so much harder to find.” • Have to educate programmers • Convince them to make their programs concurrent

  26. Starvation • Sequential programs will not use all cores • Some cores “starve” • Shared cache usage • One core evicts another core’s data • Other core has to keep accessing main memory

  27. Future of Multicore Multicore, Manycore, Hybrids

  28. Future of CMPs • Instruction-level parallelism reaching its limits • CMPs help with throughput and latency • Two types of CMP will emerge • “Manycore”: large number of small, simple cores, targets at servers/throughput • “Multicore”: fewer, faster superscalar cores for very latency sensitive programs • “Hybrids”: heterogeneous combinations

  29. References Hammond, L., Laudon, J., Olukotun, K. Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency. Morgan and Claypool, 2007. Hennessy, J. L., Patterson, D. A. Computer Architecture: A Quantitative Approach. San Francisco: Morgan Kaufmann Publishers, 2007. Mashiyat, A. S. “Multi/Many Core Systems.” St. Francis Xavier University course presentation, 2011. Schauer, Bryan. “Multicore Processors – A Necessity.” Proquest Discovery Guides. September 2008. Web. Accessed April 2 2012. <http://www.csa.com/discoveryguides/multicore/review.pdf>

More Related