Implicitly-Multithreaded Processors
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on
  • Presentation posted in: General

Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar. Presented by: Ashay Rane. Published in: SIGARCH Computer Architecture News, 2003. Agenda. Overview (IMT, state-of-art) ‏ IMT enhancements Key results Critique Relation to Term Project.

Download Presentation

Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Implicitly-Multithreaded Processors Il Park and Babak Falsafi and T. N. Vijaykumar

Presented by: Ashay Rane

Published in: SIGARCH Computer Architecture News, 2003


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Agenda

  • Overview (IMT, state-of-art)‏

  • IMT enhancements

  • Key results

  • Critique

  • Relation to Term Project


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Implicitly Multithreaded Processor (IMT)‏

  • SMT with speculation

  • Optimizations to basic SMT support

  • Average perf. improvement of 24%Max: 69%


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

State-of-the-art

  • Pentium 4 HT

  • IBM POWER5

  • MIPS MT


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Speculative SMT operation

  • When branch encountered, start executing likely path “speculatively”i.e. allow for rollback (thread squash) in certain circumstances (misprediction, dependence)

  • Overcome cost, overhead with savings in execution time and power (but worth the effort)‏

  • Complication because commit by independent threads (buffer for each thread). Also issue, register renaming, cache & TLB conflicts.

  • If dependence violation, squash thread and restart execution


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

How to buffer speculative data?

  • Load/Store Queue (LSQ)‏

    • Buffers data (along with its address)‏

    • Helps enforce dependency check

    • Makes rollback possible

  • Cache-based approaches


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

IMT: Most significant improvements

  • Assistance from Multiscalar compiler

  • Resource- and dependence-aware fetch policy

  • Multiplexing threads on a single hardware context

  • Overlapping thread startup operations with previous threads execution


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

What does Compiler do?

  • Extracts threads from program (loops)‏

  • Generates thread descriptor data about registers read and written and control flow exits (for rename tables)

  • Annotates instructions with special codes (“forward” & “release”) for dependence checking


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Fetch Policy

  • Hardware keeps track of resource utilization

  • Resource requirement prediction from past four execution instances

  • When dependencies exist (detected from compiler-generated data), bias towards non-speculative threads

  • Goal is to reduce number of thread squashes


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Multiplexing threads on a single hardware context

  • Observations:

    • Threads usually short

    • Number of contexts less (2-8)‏

      Hence frequent switching, less overlap


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Multiplexing (contd.)‏

  • Larger threads can lead to:

    • Speculation buffer overflow

    • Increased dependence mis-speculation

    • Hence thread squashing

  • Each execution context can further support multiple threads (3-6)‏


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Multiplexing: Required Hardware

  • Per context per thread:

    • Program Counter

    • Register rename table

  • LSQ shared among threads running on 1 execution context


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Multiplexing: Implementation Issues

  • LSQ shared but it needs to maintain loads and stores for each thread separately

  • Therefore, create “gaps” for yet-to-be-fetched instructions / data

  • If space falls short, squash subsequent thread

  • What if threads from one program are mapped to different contexts?

  • IMT searches through other contexts

  • Easier to have multiple LSQs per context per thread but not good cost and power consumption


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Register renaming

  • Required because multiple threads may use same registers

  • Separate rename tables

  • Master Rename Table (global)‏

  • Local Rename Table (per thread)‏

  • Pre-assign table (per thread)‏


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Register renaming: Flow

  • Thread Invocation:

    • Copy from Master table into Local table (to reflect current status)‏

    • Also use “create” and “use” mask of thread descriptor(to for dependence check)‏

  • Before every subsequent thread invocation:

    • Pre-assign rename maps into Pre-assign table

    • Copy from Pre-assign table to Master table and mark registers as “busy”. So no successor thread can use them before current thread writes to them.


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Hiding thread startup delay

  • Rename tables to be setup before execution begins

  • Occupies table bandwidth, hence cannot be done for a number of threads in parallel

  • Hence overlap setting up of rename tables with previous thread’s execution


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Load/Store Queue

  • Per context

  • Speculative load / store: Search through current and other contexts for dependence

  • No searching for non-speculative loads

  • Searching can take time, so schedules load-dependent instructions accordingly


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Key Results


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

  • Average improvement: 24%

  • Reduction in data dependence stalls

  • Little overhead of optimizations

  • Not all benchmark programs


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

  • Assuming 2-3 threads per context, 6-8 LSQ entries per thread.

  • Performance relative to IMT with unlimited resources


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

  • ICOUNT: Favor least number of instructions remaining to be executed

  • Biased-ICOUNT: Favor non-speculative threads

  • Worst-case resource estimation

  • Reduced thread squashing


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

  • TME: Executes both paths of an unpredictable branch (but such branches uncommon)‏

  • DMT:

    • Hardware-selection of threads. So spawns threads on backward-branch or function call instead of loops.

    • Also spawns threads out of order. So lower accuracy of branch prediction.


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Critique


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Compiler Support

  • Improvement in applications compiled using Multiscalar compiler

  • Scientific computing applications, not for desktop applications


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

LSQ Limitations

  • LSQ size deciding the size of speculative thread

  • Pentium 4 (without SMT):48 Loads, 24 Stores

  • Pentium 4 HT:24 Loads, 12 Stores per thread

  • IBM Power5:32 Loads, 32 Stores per thread


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

LSQ Limitations: Alternative

  • Cache-based approachi.e. Partition the cache to support different versions

  • Extra support required, but scalable


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Register file size

  • IMT considers register file sizes of 128 and up.

  • Pentium 4 (as well as HT):Register file size = 128

  • IBM POWER5:Register file size = 80


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Searching LSQ

  • Since loads and stores organized as per thread, search involves all locations of other threads.

  • If loads/stores organized according to addresses then lesser values to search.

  • Can make use of associativity of cache


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Searching LSQ (contd.)‏


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

So how is performance still high?

  • Assistance from Compiler

  • Resource and dependency-aware fetching

  • Multiple threads on an execution context

  • Overlapping rename table creation with execution


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Term project

  • “Cache-based throughput improvement techniques for Speculative SMT processors”

  • Optimizations from IMT

  • Increasing granularity to reduce number of thread squashes


Implicitly multithreaded processors il park and babak falsafi and t n vijaykumar

Thank you


  • Login