1 / 25

Multicores, Manycores and Amdahl’s Law

Multicores, Manycores and Amdahl’s Law. 2012. Amdahl’s Law – Reminder. Original Amdahl’s Law for n identical cores f – fraction of parallelizable execution time (1-f) – fraction of totally sequential execution time Sequential runs on a single core Parallel runs on all n cores

beck
Download Presentation

Multicores, Manycores and Amdahl’s Law

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multicores, Manycores and Amdahl’s Law 2012

  2. Amdahl’s Law – Reminder • Original Amdahl’s Law for n identical cores • f – fraction of parallelizable execution time • (1-f) – fraction of totally sequential execution time • Sequential runs on a single core • Parallel runs on all n cores • Q: What are the hidden assumptions?

  3. Multicore CPU • Manycore – Tens or hundreds of cores • Why don’t we have Sandy Bridge with 100 cores? Intel’s Sandy Bridge

  4. Core Performance Constraints • Manufacturing technology • Area (for more logic) • Area = Money; Manufacturing constraints • Power (for more logic, higher frequencies) • Sub-threshold leakage current • More power requires better cooling solutions

  5. So Why Not One Single Core? Core

  6. Large Core Performance Accurate Branch Prediction Big data caches e.g., Simple In-order core Large Core • We have a base line core (BCE) with area=1, performance=1 • We can add microarchitectural features • New core area is then r (r>1) • Large core is faster, with performance of perf(r) • Q: For which perf(r) function, large core is better than multiple small ones? • So what is perf(r) ? uOp Cache BCE OOOE

  7. Area: Pollack’s Rule • An empirical rule • Multicore implications. For example: double the CPU logic and get • 40% more performance with a larger single-core • For purely parallel code – 100% more performance with dual-core

  8. Power • Power is usually considered as proportional to area • In this presentation we consider area as the main constraint • Not completely true [Esmaeilzadeh’11] • For simplicity we keep with

  9. Why Multicore/Manycore? • More performance per mm2 & watt for parallel code • Less power (& heat) • Save power by turning on and off each CPU • Run each core in optimized frequency/power • Load balance to distribute heat • Lower die temperatures • New performance constraint: parallel fraction

  10. Cost Model • To find the best performing CPU configuration we need a cost model • Basic core - Baseline Core Equivalent (BCE) • Chip is limited to have no more than nBCEs • Performance • Performance of each BCE is 1 • Architects can expand the resources of rBCEs to create a powerful core with performance of perf(r) • f– fraction of the parallelizable execution time

  11. Symmetric Multicore Chips • Run the sequential part on one core • Run the parallel part on all cores n=16 r=1 16 1-BCE cores 4 4-BCE cores n=16 r=4

  12. Symmetric Multicore Chips • n/r identical cores • Each core performance perf(r) • Execution • Sequential part – 1 core; performance - perf(r) • Parallel part – all cores; performance - perf(r) * n/r

  13. Symmetric, n=16 F=0.9, R=2, Cores=8, Speedup=6.7 As Moore’s Law enables N to go from 16 to 256 BCEs, More core enhancements? More cores? Or both?

  14. Symmetric, n=256 F1 R=1 (vs. 1) Cores=256 (vs. 16) Speedup=204 (vs. 16) MORE CORES! F=0.99 R=3 (vs. 1) Cores=85 (vs. 16) Speedup=80 (vs. 13.9) CORE ENHANCEMENTS& MORE CORES! F=0.9 R=28 (vs. 2) Cores=9 (vs. 8) Speedup=26.7 (vs. 6.7) CORE ENHANCEMENTS!

  15. Symmetric Multicores • In symmetric multicores with fixed n, perf(r)=sqrt(r), maximum performance is achieved when: • Q1: When will a single core perform better than any symmetric multicore? • Q2: In the optimal configuration, what are the proportions of the execution time between the optimal sequential and parallel parts?

  16. Asymmetric Multicore Chips • Run the sequential part on the big core • Run the parallel part on all cores One 4-BCE core; Twelve 1-BCE cores

  17. Asymmetric Multicore Chips • One large r-BCE core with performance of perf(r) • n-rsmall 1-BCE cores with performance of 1 • Execution: • Sequential part – 1 core; performance - perf(r) • Parallel part – all cores; performance - perf(r) + n - r

  18. Asymmetric, n=256 • Is asymmetric architecture potential greater than that of symmetric? Recall F=0.99 R=41 Cores=216 Speedup=166

  19. Dynamic (Composed) Multicore Chips • Combine up to r cores to boost sequential performance • Helper threads • Thread LevelSpeculation • Hardware supportmay be required • Q: Why “up to r cores”?

  20. Dynamic (Composed) Multicore Chips • Execution: • Sequential part – 1 big core; performance - perf(r) • Parallel part – all cores; performance – n

  21. Dynamic, n=256 • Q: How does dynamic multicore scale relatively to symmetric and asymmetric? F=0.99 R=256 (vs. 41) Cores=256 (vs. 216) Speedup=223 (vs. 166) Note: #Cores always N=256

  22. Manufacturing Technology • New manufacturing technology will not save us

  23. The Future…

  24. Summary • Multicores and manycores are required due to the diminishing returns of large cores • Amdahl’s Law allows us to predict the performance of various architectures • Dynamic (composed) architecture is promising • To take advantage of future CPUs, the parallel part of the code must be very high • …and still we are going to have a problem

  25. References • Amdahl’s Law in the Multicore Era [Hill’08] • Thousand Core Chips—A Technology Perspective [Borkar’07] • Dark Silicon and the End of Multicore Scaling [Esmaeilzade’11] • Performance, Power Efficiency and Scalability of Asymmetric Cluster Chip Multiprocessors [Morad’05]

More Related