Trace level speculative multithreaded architecture
Download
1 / 25

Trace-Level Speculative Multithreaded Architecture - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

ICCD´02, Freiburg (Germany) - September 16-18, 2002. Trace-Level Speculative Multithreaded Architecture. Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain [email protected] Antonio González and Jordi Tubella

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Trace-Level Speculative Multithreaded Architecture' - azia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Trace level speculative multithreaded architecture

ICCD´02, Freiburg (Germany) - September 16-18, 2002

Trace-Level Speculative Multithreaded Architecture

Carlos Molina

Universitat Rovira i Virgili – Tarragona, [email protected]

Antonio González and Jordi Tubella

Universitat Politècnica de Catalunya – Barcelona, Spain {antonio,jordit}@ac.upc.es


Outline
Outline

  • Motivation

  • Related Work

  • TSMA

  • Performance Results

  • Conclusions


Motivation
Motivation

  • Two techniques to avoid serialization caused by data dependences

    • Data Value Speculation

    • Data Value Reuse

  • Speculation predicts values based on past

  • Reuse is posible if has been done in the past

  • Both may be considered at two levels

    • Instruction Level

    • Trace Level


Trace level reuse

Trace Level Reuse

Static

Dynamic

Trace Level Reuse

  • Set of instructions can be skipped in a row

  • These instructions do not need to be fetched

  • Live input test is not easy to handle


Trace level speculation

Trace Level Speculation

With Live Output Test

With Live Input Test

Trace Level Speculation

  • Solves live input test

  • Introduces penalties due to misspeculations

  • Two orthogonal issues

    • microarchitecture support for trace speculation

    • control and data speculation techniques

      • prediction of initial and final points

      • prediction of live output values


Trace level speculation with live input test

Live Output Actualization & Trace Speculation

INSTRUCTION EXECUTION

NOT EXECUTED

LIVE INPUT VALIDATION & INSTRUCTION EXECUTION

Trace Level Speculation with Live Input Test

ST

NST

Miss Trace Speculation Detection & Recovery Actions


Trace level speculation with live output test

Live Output Actualization & Trace Speculation

BUFFER

BUFFER

INSTRUCTION EXECUTION

NOT EXECUTED

LIVE OUTPUT VALIDATION

Trace Level Speculation with Live Output Test

ST

NST

Miss Trace Speculation Detection & Recovery Actions


Related work
Related Work

  • Trace Level Reuse

    • Basic blocks (Huang and Lilja, 99)

    • General traces (González et al, 99)

    • Traces with compiler support (Connors and Hwu, 99)

  • Trace Level Speculation

    • DIVA (Austin, 99)

    • Slipstream processors (Rotenberg etal, 99)

    • Pre-execution (Sohi et al, 01)

    • Precomputation (Shen et al, 01)

    • Nearby and distant ILP (Balasubramonian etal, 01)


ST I Window

NST I Window

ST Ld/St Queue

Branch

Decode &

Functional

Fetch

I

NST Ld/St Queue

Units

Engine

Predictor

Cache

Rename

ST Reorder Buffer

Trace

NST Reorder Buffer

Speculation

Data

L1SDC

Cache

NST Arch.

Verification

ST Arch.

Register File

Engine

Register File

L1NSDC

L2NSDC

TSMA

Look Ahead Buffer


Trace speculation engine
Trace Speculation Engine

  • Two issues may handle

    • to implement a trace level predictor

    • to communicate trace speculation opportunity

  • Trace level predictor

    • PC-indexed table with N entries

    • Each entry contains

      • live output values

      • final program counter of trace

  • Trace speculation communication

    • INI_TRACE instruction

    • Additional MOVE instrucions


Look ahead buffer
Look Ahead Buffer

  • First-input first-output queue

  • Stores instructions executed by ST

  • The fields of each entry are:

    • Program Counter

    • Operation Type: indicates memory operation

    • Source register Id 1 & source value 1

    • Source register Id 2 & source value 2

    • Destination register Id & destination value

    • Memory address


Verification engine
Verification Engine

  • Validates speculated instructions

  • Mantains the non-speculative state

  • Consumes instructions from LAB

  • Test is performed as follows:

    • testing source values of Is with non-speculative state

    • if matching, destination value of I may be updated

    • memory operations check effective address

    • store instructions update memory, rest update registers

  • Hardware required is minimal


Thread synchronization
Thread Synchronization

  • Handles trace misspredictions

  • Recovery actions involved are:

    • Instruction execution is stopped

    • ST structures are emptied (IW,LSQ,ROB,LAB)

    • Speculative cache and ST register file are invalidated

  • Two types of synchronization

    • Total (Occurs when NST is not executing instructions)

      • Penalty due to fill again the pipeline

    • Partial (Occurs when NST is executing instructions)

      • No penalty

      • NST takes the role of ST


Memory subsystem

1

2

3

4

5

Additional and small first level cache is added to mantain memory speculative state

Traditional memory subsystem is supported

L1SDC

NST store updates values and allocate space in NS caches

ST store updates values in L1SDC only

ST load get values from L1SDC. If not, get from NS caches

NST loads get values and allocates space in NS caches

Line replaced in L1NSDC is copied back to L2NSDC

L1NSDC

L2NSDC

Memory Subsystem

  • Mantains memory state

    • speculative

    • non speculative


Register file
Register File

  • Slight modification to permit prompt execution

  • Register map table contains for each entry:

    • Commited Value

    • ROB Tag

    • Counter

  • Counter field is mantained as follows:

    • New ST instruction increases dest. register counter

    • Counter is decreased when ST instruction is commited

    • After trace speculation counter are no longer increased

    • But it is decreased until reaches the value zero.


Working example

10

10

Live Output Actualization & Trace Speculation

NST Begins Execution

NST Executes Speculated Trace

VE Validates Instructions

ST Begins Execution

VE Begins Verification

VE Finishes Verification

Live Output Actualization & Trace Speculation

NST Executes Some Additional Instructions

NST Execution

8

8

2

3

4

5

6

7

9

1

9

7

2

3

4

5

6

1

INSTRUCTION EXECUTION

NOT EXECUTED

LIVE OUTPUT VALIDATION

Working Example

ST

NST

VE


Experimental framework
Experimental Framework

  • Simulator

    Alpha version of the SimpleScalar Toolset

  • Benchmarks

    Spec95

  • Maximum Optimization Level

    DEC C & F77 compilers with -non_shared -O5

  • Statistics Collected for 125 million instructions

    Skipping initializations




Performance evaluation
Performance Evaluation

  • Main objective:

    • trace misspeculations cause minor penalties

  • Traces are built following a simple rule

    • from backward branch to backward branch

    • minimum and maximum size of 8 and 64 respectively

  • Simple Trace Predictor is evaluated

    • Stride + Context Value (history of 9)

  • Results provided

    • Percentage of misspeculations

    • Percentage of predicted instructions

    • Speedup


Misspeculations
Misspeculations

100

90

80

70

60

50

40

30

20

10

0


Predicted instructions
Predicted Instructions

50

40

30

20

10

0


Speedup
Speedup

1.35

1.30

1.25

1.20

1.15

1.10

1.05

1.00


Conclusions
Conclusions

  • TSMA

    • designed to exploit trace-level speculation

  • Special emphasis on

    • minimizing misspeculation penalties

  • Results show:

    • architecture is tolerant to misspeculations

    • speedup of 16% with a predictor that misses 70%


Future work
Future Work

  • Agressive trave level predictors

    • bigger traces

    • better value predictors

  • Generalization to multiple threads

    • cascade execution

  • Mixing prediction & execution

    • speculated traces do not need to be fully speculated


ad