pushing performance efficiency and scalability of microprocessors
Download
Skip this Video
Download Presentation
Pushing Performance, Efficiency and Scalability of Microprocessors

Loading in 2 Seconds...

play fullscreen
1 / 15

Pushing Performance, Efficiency and Scalability of Microprocessors - PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on

Pushing Performance, Efficiency and Scalability of Microprocessors. CERCS IAB Meeting, Fall 2006 Gabriel Loh. Research Overview. Funding from state of GA, Intel, MARCO Currently 2 PhD students, 2 MS Active undergrad research as well Collaborations Universities: PSU, UO, Rutgers

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Pushing Performance, Efficiency and Scalability of Microprocessors ' - umika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pushing performance efficiency and scalability of microprocessors

Pushing Performance, Efficiency and Scalability of Microprocessors

CERCS IAB Meeting, Fall 2006

Gabriel Loh

research overview
Research Overview
  • Funding from state of GA, Intel, MARCO
  • Currently 2 PhD students, 2 MS
    • Active undergrad research as well
  • Collaborations
    • Universities: PSU, UO, Rutgers
    • Industry: Intel, IBM
research focus
Research Focus
  • “Near-term” microprocessor design issues
    • ~ 5-year time scale
    • Power/performance/complexity
    • Traditional uniprocessor performance
    • Multi-core performance
  • “Longer-term”
    • Keeping Moore’s Law alive for the longer term
    • Primarily, 3D integration for now
scaling performance and efficiency
Scaling Performance and Efficiency
  • Multi-cores are here, but single-thread perf still matters
    • Intel Core 2 Duo is multi-core, but…
    • Single core is more OOO than ever
      • Larger instruction window, improved branch prediction, speculative load-store ordering, wider pipe and decoders
    • But power also really matters
      • Lower clock speeds, different channel length transistors, more uop fusion, …
research focus1
Research Focus
  • Maximum performance within bounds
    • Bounds = power, area, TDP, …
  • Single-core performance helps multi-core performance, too
    • For future multi-core systems, need to strike a good balance between 1T and MT
  • Most of our research is at the uarch level
    • Caches, branch predictors, instruction schedulers, memory queue design, memory dependence prediction, etc.
highlight traditional caching micro 06
Highlight: Traditional Caching [MICRO’06]
  • Well known that different apps respond differently to different replacement policies
  • Previous work in the OS domain has described adaptive replacement with provable bounds on performance
  • Adapted techniques for on-chip caches
adaptive cache implementation
Adaptive Cache Implementation
  • Theoretical Guarantees
    • Miss rate provably bounded to be within a factor of two of the better algorithm

In practice,

it’s much better

current research
Current Research
  • Working on multi-core generalizations of adaptive caching and other ways to manage shared resources
  • Uniprocessor microarchitecture
    • Scalable memory scheduling [MICRO’06]
    • Memory dependence prediction [HPCA’06]
    • Branch prediction […]
    • And more…
longer term processor scaling
Longer-Term Processor Scaling
  • Limitations/Obstacles
    • Wire scaling
      • Latency/performance
      • Power
    • Feature size
      • Lithography, parametric variations
    • Off-chip communication
3d integration
3D Integration

Active

Layer 1

  • Wire
    • Power/perf.
  • Off-chip
  • Feature size
    • Limitations, variations

Metal

Layers 1

Die-to-Die

Vias

Metal

Layers 2

Active

Layer 2

Die/Wafer Stacking

Less RC  faster, lower-power

example caches

Wordline length halved

  • in our studies, WL was critical for latency

3D Bitline Stacking

  • Bitline length halved
  • BL reduction has greater impact on power savings
  • Split decoder  no activity stacking

3D Wordline Stacking

Example: Caches

We’ve studied

a wide variety

of other CPU

building blocks

Simplified 2D SRAM Array

uarch level 3d design
Uarch-level 3D design

Smaller footprint 

faster and lower-power

Width-based gating 

even lower power,

close to original power density

Overall: 47% performance gain at

only 2 degree temperature increase

Example: 4-die significance-partitioned datapath

Use uarch prediction mechanism for early determination of width

3d research summary
3D Research Summary
  • Circuit-level [ICCD’05,ISVLSI’06,ISCAS’06,GLSVLSI’06]
  • Uarch-level [MICRO’06 (w/ ),HPCA’07]
  • Tutorial papers [JETC’06]
  • Tutorial [MICRO’06]
  • Tools [DATE’06,TCAD’07] w/ GTCAD &
  • Parametric Variations w/ Jim Meindl
  • Funding, equip from ,
summary
Summary
ad