slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University PowerPoint Presentation
Download Presentation
Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University

Loading in 2 Seconds...

play fullscreen
1 / 12

Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University - PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on

e X treme V irtual P ipelining (XVP): Moving Towards Scalable Multithreaded Processors. Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University of Michigan, Ann Arbor † Advanced Micro Devices (AMD). ASPLOS – WACI ‘09.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Korey Sewell* , Trevor Mudge*, Steven K. Reinhardt* † *Advanced Computer Architecture Labaratory (ACAL) University' - jayme


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

eXtremeVirtual Pipelining (XVP):

Moving Towards Scalable

Multithreaded Processors

Korey Sewell*, Trevor Mudge*, Steven K. Reinhardt*†

*Advanced Computer Architecture Labaratory (ACAL)

University of Michigan, Ann Arbor

†Advanced Micro Devices (AMD)

ASPLOS – WACI ‘09

the comp arch research train
The Comp. Arch. Research Train

P = Processor(s)

T =Thread(s)

Uniprocessor-Place

(1P, 1T)

Many-Core Mansion

(~32-64P, ~2-4T)

Multithreading-Ville

(1P, ~2-4T)

Did we miss a stop on the way????

What about “Many”-Threading?!!!

Multicore-Estates

(2-4P, ~2-4T)

why many threading
Why “Many-Threading”?
  • CHANGES the way we think about architecture…
    • Moving from 2-4 threads per core to 16, 32 or even 64 threads per core
    • Threads aren’t just Parallel…They’re Adjacent!
  • What would you create if you had “threads to throw away”?
    • Hmmmmmmm…..
slide4

WACI,“Many”-Threading Possibilities

  • “Coherence-Free” Synchronization & Communication
    • Why Suffer from Non-Deterministic Memory Latency when so many threads are adjacent (on same core)?

Memory System

CPU

CPU

T0

T0

T1

T1

T2

T2

TN

TN

slide5

WACI,“Many”-Threading Possibilities

  • Extremely Speculative Multithreading
    • Use extra threads during speculative events (e.g. branch misprediction, cache miss)
    • Fast forward execution by traversing speculation tree and then switching threads.

T

T

F

Branch

Misprediction

T

F

F

slide6

WACI,“Many”-Threading Possibilities

  • Super Virtual Machines
    • Security: Every application given it’s own VM?
  • Many-Many Systems!
    • Many Threads, Many Cores
      • 1000 thread system = 64 cores, 16 threads per core
  • Redundant Multithreading
  • This list keeps going….and going…and going!!!
how do we get to many threading
How do we get to Many-Threading?
  • A design that avoids non-scalable, conventional multithreading pitfalls such as…
    • Replication of per-thread resources
    • Extensive size increases of shared resources
    • Complex resource distribution methods amongst threads
waci solution e x treme v irtual p ipelining xvp
WACI Solution:eXtremeVirtual Pipelining (XVP)

= T1

= TN

= T0

  • Provide each thread the illusion that it has all the processor resources to itself
  • Traditionally, simultaneous executing threads have a shared pipeline view

= T0 - TN

IQ

IQ

IQ

IQ

RF

RF

RF

RF

ROB

ROB

ROB

ROB

F

F

F

D

D

D

D

R

R

R

R

F

EXE

EXE

EXE

EXE

LSQ

LSQ

LSQ

LSQ

waci solution e x treme v irtual p ipelining xvp9
WACI Solution:eXtremeVirtual Pipelining (XVP):
  • Pipeline Virtualization: Resource entries are mapped into each thread’s address space

Resource “X”

BaseT0 +

BaseT1 +

BaseTN +

0 … 7

0 … 7

0 … 7

0

T0

T1

TN

CPU

7

MEMORY

waci solution e x treme v irtual p ipelining xvp10
WACI Solution:eXtremeVirtual Pipelining (XVP):
  • XVP extends the notion of a hardware context to include pipeline resources
    • Add a C-Cache (Context) to avoid D-Cache thrashing and potentially reduce memory footprint in workloads
  • Each stallable resource matched with it’s own “on-demand” Fill-Spill-Unit (FSU)
    • Ex:Spill IQ on dep. load miss / Fill when miss resolves
    • FSU allows resources to dynamically partition themselves for arbitrary workloads
  • Virtualize all stalling processor resources to memory
    • Fetch Buffer, Instruction Queue, Load/Store Queue, Register File, Reorder Buffer

C-Cache

FSU

FSU

FSU

FSU

IQ

RF

ROB

F

D

R

EXE

LSQ

waci conclusion e x treme v irtual p ipelining xvp
WACI Conclusion:eXtremeVirtual Pipelining (XVP)
  • A high # of threads per core opens up interesting multithreading research angles
  • XVP’s pipeline virtualization moves toward scalable many-threads per core
    • Each thread has illusion that it has it’s own pipeline
  • XVP can also benefit single-thread processors…
    • Because XVP’s virtualization provides more resources than traditionally available.