1 / 15

The Paradigm Shift to Multi-Cores: Opportunities and Challenges

In memory of Stamatis. The Paradigm Shift to Multi-Cores: Opportunities and Challenges. Per Stenstrom Department of Computer Science & Engineering Chalmers University of Technology Sweden. 30% annual performance growth. 60% annual performance growth. An Unwanted Paradigm Shift.

ballance
Download Presentation

The Paradigm Shift to Multi-Cores: Opportunities and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. In memory of Stamatis The Paradigm Shift to Multi-Cores: Opportunities and Challenges Per Stenstrom Department of Computer Science & Engineering Chalmers University of Technology Sweden

  2. 30% annual performance growth 60% annual performance growth An Unwanted Paradigm Shift • Clock frequency couldn’t be pushed higher • Traditional parallelism exploitation didn’t pay off

  3. The Easy Way Out: Replicate • Moore’s Law: 2X cores every 18 months • Implication: About a hundred cores in five years • BUT: Software can only make use of one!

  4. Main Challenges • Programmability • Scalability We want to seamlessly scale up application performance within power envelope

  5. Application SW (existing and new) P P System software infrastructure M P P P P Multi-core Vision: Multiple Cores = One Processor Requires a concerted action across layers: programming model, compiler, architecture

  6. On-chip cache management Support for enhancing programmability P P How can Architects Help? • What is the best use of the many transistors? Cache hierarchy P

  7. ”Inherent” Speculative Parallelism[Islam et al. ICPP 2007] Representative of what is possible today Scaling beyond eight cores will need manual efforts

  8. Goal: balance load & reduce communication Three Hard Steps • Decomposition • Assignment • Orchestration Goal: expose concurrency but beware of thread mngmt overhead Goal: Orchestrate threads to reduce communication and synchronization costs

  9. LD A LD A ST A ST A Transactional Memory Transactional memory provides a safety net for data races: hence, simplify coordination T1 T2 • Research is warranted into high-productivity programming interfaces • Transactional memory is a good starting point SQUASH RE-EXECUTE

  10. Transistors can Help Programmers Recall the ”hard steps”: • Decomposition • Assignment • Orchestration Opportunities abound Low-overhead spawning mechanisms Load balancing supported in HW Communication balancing supported in HW

  11. Memory Memory Memory P-M speed gap How to bridge it? Cache hierarchy Cache hierarchy Cache hierarchy P P P P P P P P P P P P Processor/Memory Gap

  12. P1 P2 P2 P2 P1 P1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 Adaptive Hybrid Shared Private +++ +++ +++ Conflicts Speed Utilization --- --- +++ +++ +++ --- Adaptive Shared Caches[Dybdahl & Stenstrom HPCA 2007]

  13. Memory Memory Memory Off-chip bandwidth bottleneck Cache hierarchy Cache hierarchy Cache hierarchy P P P P P P P P P P P P ... ... ... Scaling-Up Off-chip Bandwidth BW does not scale with Moore’s law unless optics or other disruptive technologies change the rules

  14. Memory/Cache Link Compression[Thuresson & Stenstrom, IEEE TC to appear] Our combined scheme yields 3X in bandwidth reduction

  15. Summary • Multi-cores promise scalable performance under a manageable power envelope, but are hard to program • To provide scalable application performance for the future requires research at all levels • Architecture (processor, cache, interconnect) • Compiler • Programming model These topics are dealt with in the FET SARC IP and in the HiPEAC network of excellence

More Related