Pushing performance efficiency and scalability of microprocessors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Pushing Performance, Efficiency and Scalability of Microprocessors PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Pushing Performance, Efficiency and Scalability of Microprocessors. CERCS IAB Meeting, Fall 2006 Gabriel Loh. Research Overview. Funding from state of GA, Intel, MARCO Currently 2 PhD students, 2 MS Active undergrad research as well Collaborations Universities: PSU, UO, Rutgers

Download Presentation

Pushing Performance, Efficiency and Scalability of Microprocessors

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pushing performance efficiency and scalability of microprocessors

Pushing Performance, Efficiency and Scalability of Microprocessors

CERCS IAB Meeting, Fall 2006

Gabriel Loh


Research overview

Research Overview

  • Funding from state of GA, Intel, MARCO

  • Currently 2 PhD students, 2 MS

    • Active undergrad research as well

  • Collaborations

    • Universities: PSU, UO, Rutgers

    • Industry: Intel, IBM


Research focus

Research Focus

  • “Near-term” microprocessor design issues

    • ~ 5-year time scale

    • Power/performance/complexity

    • Traditional uniprocessor performance

    • Multi-core performance

  • “Longer-term”

    • Keeping Moore’s Law alive for the longer term

    • Primarily, 3D integration for now


Scaling performance and efficiency

Scaling Performance and Efficiency

  • Multi-cores are here, but single-thread perf still matters

    • Intel Core 2 Duo is multi-core, but…

    • Single core is more OOO than ever

      • Larger instruction window, improved branch prediction, speculative load-store ordering, wider pipe and decoders

    • But power also really matters

      • Lower clock speeds, different channel length transistors, more uop fusion, …


Research focus1

Research Focus

  • Maximum performance within bounds

    • Bounds = power, area, TDP, …

  • Single-core performance helps multi-core performance, too

    • For future multi-core systems, need to strike a good balance between 1T and MT

  • Most of our research is at the uarch level

    • Caches, branch predictors, instruction schedulers, memory queue design, memory dependence prediction, etc.


Highlight traditional caching micro 06

Highlight: Traditional Caching [MICRO’06]

  • Well known that different apps respond differently to different replacement policies

  • Previous work in the OS domain has described adaptive replacement with provable bounds on performance

  • Adapted techniques for on-chip caches


Pushing performance efficiency and scalability of microprocessors

Idea…


Adaptive cache implementation

Adaptive Cache Implementation

  • Theoretical Guarantees

    • Miss rate provably bounded to be within a factor of two of the better algorithm

In practice,

it’s much better


Current research

Current Research

  • Working on multi-core generalizations of adaptive caching and other ways to manage shared resources

  • Uniprocessor microarchitecture

    • Scalable memory scheduling [MICRO’06]

    • Memory dependence prediction [HPCA’06]

    • Branch prediction […]

    • And more…


Longer term processor scaling

Longer-Term Processor Scaling

  • Limitations/Obstacles

    • Wire scaling

      • Latency/performance

      • Power

    • Feature size

      • Lithography, parametric variations

    • Off-chip communication


3d integration

3D Integration

Active

Layer 1

  • Wire

    • Power/perf.

  • Off-chip

  • Feature size

    • Limitations, variations

Metal

Layers 1

Die-to-Die

Vias

Metal

Layers 2

Active

Layer 2

Die/Wafer Stacking

Less RC  faster, lower-power


Example caches

  • Wordline length halved

  • in our studies, WL was critical for latency

3D Bitline Stacking

  • Bitline length halved

  • BL reduction has greater impact on power savings

  • Split decoder  no activity stacking

3D Wordline Stacking

Example: Caches

We’ve studied

a wide variety

of other CPU

building blocks

Simplified 2D SRAM Array


Uarch level 3d design

Uarch-level 3D design

Smaller footprint 

faster and lower-power

Width-based gating 

even lower power,

close to original power density

Overall: 47% performance gain at

only 2 degree temperature increase

Example: 4-die significance-partitioned datapath

Use uarch prediction mechanism for early determination of width


3d research summary

3D Research Summary

  • Circuit-level [ICCD’05,ISVLSI’06,ISCAS’06,GLSVLSI’06]

  • Uarch-level [MICRO’06 (w/ ),HPCA’07]

  • Tutorial papers [JETC’06]

  • Tutorial [MICRO’06]

  • Tools [DATE’06,TCAD’07] w/ GTCAD &

  • Parametric Variations w/ Jim Meindl

  • Funding, equip from ,


Summary

Summary

  • [email protected]

  • http://www.cc.gatech.edu/~loh

  • Lots of exciting work going on here


  • Login