serializing instructions in system intensive workloads
Download
Skip this Video
Download Presentation
Serializing Instructions in System Intensive Workloads

Loading in 2 Seconds...

play fullscreen
1 / 26

Serializing Instructions in System Intensive Workloads - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

Serializing Instructions in System Intensive Workloads. (Amdahl’s Law Strikes Again). Philip Wells and Guri Sohi {pwells, sohi}@cs.wisc.edu HPCA Feb, 2008. Serializing instructions overview. Serializing instruction (SIs) have complex deps

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Serializing Instructions in System Intensive Workloads' - evonne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
serializing instructions in system intensive workloads

Serializing Instructions in System Intensive Workloads

(Amdahl’s Law Strikes Again)

Philip Wells and Guri Sohi

{pwells, sohi}@cs.wisc.edu

HPCA Feb, 2008

serializing instructions overview
Serializing instructions overview
  • Serializing instruction (SIs) have complex deps
    • Difficult to execute OoO, often serialize the pipeline
    • E.g. writes to control registers
  • SIs frequent in OS code, across ISAs
    • Reduce OS performance by 8-45%
    • Values produced by SIs are often effectively useless (EU)
  • EU prediction allows consumers to proceed
    • May read stale value, but execute correctly
    • Improves OS performance by 6-35%

Philip Wells - HPCA 2008

talk outline
Talk outline
  • Serializing instructions
    • Description, implementation & performance
  • Characterization
    • Frequency across 3 ISAs
    • Useful consumption
  • Effectively useless prediction
    • Overview & operation
    • Performance results
  • Summary

Philip Wells - HPCA 2008

what are sis

IG

PRIV

MG

CLE

TLE

MM

RED

PEF

AM

IE

AG

What are SIs?
  • Talk focus: Writes to non-renamed control registers
    • e.g. explicit writes, exceptions & returns
  • Not renamed due to complex dependencies
    • Read by control logic at many pipeline stages
  • Difficult to execute OoO
    • Most processors serialize pipeline
    • Discussion of real implementations in paper

Fetch

Decode

%pstate

Execute

Commit

Philip Wells - HPCA 2008

effects of amdahl s law
Effects of Amdahl’s law

Execution of OS code (Ideal SPARC)

Fetch stall on SI

(% of cycles)

Philip Wells - HPCA 2008

si discussion
SI discussion
  • Received little research
    • Mostly affects OS code
      • Largely absent in SPEC or short traces
    • Viewed as specific to a particular implementation
  • Our characterization shows that
    • SIs are important for system-intensive apps
    • Characterization similar across multiple ISAs
    • Implementations similar across multiple processors

Philip Wells - HPCA 2008

outline
Outline
  • Serializing instructions
  • Characterization
    • Frequency across 3 ISAs
    • Useful consumption
  • Effectively useless prediction
  • Summary

Philip Wells - HPCA 2008

characterization of sis
Characterization of SIs
  • Methodology
    • Several commercial workloads
    • SPARC, X86-64 & PowerPC platforms on Simics
      • ‘Normal’ SPARC: with register window and TLB traps
      • ‘Ideal’ SPARC: reg win traps removed & HW-fill TLB
    • Uniprocessor systems
    • Details in paper

Philip Wells - HPCA 2008

si frequency
SI frequency

Frequent across ISAs

Similar profile & dominated by register writes

Frequent exceptions in normal SPARC

Ideal SPARC

X86

PowerPC

‘Normal’ SPARC

Philip Wells - HPCA 2008

effectively useless eu writes
Effectively useless (EU) writes
  • Many non-renamed registers writes are EU
    • Produce a new value
    • Consumers read the value
    • But their execution is unaffected

Philip Wells - HPCA 2008

eu characterization
EU characterization

Dyn Dead

[Butts & Sohi ‘02]

Most values are quickly consumed,

but not useful to the first consumers

30% of writes are consumed by the next instruction

< 20% of writes are useful

within 1023 instructions

Zeus on Ideal SPARC, implicit consumers only

Philip Wells - HPCA 2008

why effectively useless
Why effectively useless?
  • Control registers have many fields
    • SIs write entire register, decode stage must serialize
    • But often only update one field
  • Turn off interrupts (from Solaris 9):
  • EU subsumes both
    • Dynamically dead [Butts ‘02] & silent writes [Lepak ‘00]

rdpr %pstate, %o5

andn %o5, 2, %o4

wrpr %o4, 0, %pstate

Serializing instr!

Philip Wells - HPCA 2008

outline1
Outline
  • Serializing instructions
  • Characterization
  • Effectively useless prediction
    • Overview & operation
    • Performance results
  • Summary

Philip Wells - HPCA 2008

effectively useless prediction
Effectively useless prediction
  • Goals
    • Allow EU writes and consumers to execute OoO
    • Few changes to pipeline & datapath
    • Easy test to ensure consumers execute correctly
  • Overview
    • Allow consumers to proceed under certain conditions
    • Guarantee non-faulting consumers execute correctly

Philip Wells - HPCA 2008

eu prediction operation
EU Prediction Table

Was this write EU last time?

1

0

P

B

C

WritePtr

0

pstate

1

0

1

2

1

0

0

0

-

1

fprs

0

1

0

5

1

0

0

0

-

0

0

0

0

-

1

0

0

0

-

EU prediction operation

SIs:

1) Make EU prediction

2) Update status

  • Outstanding Write Table
    • Status of writes to each control reg

Fetch

Decode

Decode

Consumers:

1) Check each control reg

2) Proceed if all writes are EU

(may read stale value)

Issue

Execute

Consumer Exception:

1) Squash if proceeded past EU write

Write PC

Write Back

SIs:

1) Check for useful changes

2) Squash younger instr if useful cons

3) Update status & EU prediction table

Commit

Commit

Philip Wells - HPCA 2008

what are useful changes
What are useful changes?
  • Useful unless:

1) The write is silent (~14%)

2) Change will only affect faulting instructions (~65%)

      • Setting FEF field of %fprs to one
      • Interrupt example earlier
      • Several other common cases
  • Overly conservative
    • But captures most common cases
    • Satisfies goal of simple test

Philip Wells - HPCA 2008

eu prediction methodology
EU prediction methodology
  • OoO processor
    • 128-entry instr. window
    • 15 stage pipe
    • 32kB L1I/D, 1MB L2
    • 265-cycle main mem
  • Simics MAI as a dynamic trace generator
    • Adapts to changes due to timing
    • Faithfully models wrong-path events
    • Ideal SPARC
  • Details in paper

Philip Wells - HPCA 2008

eu prediction results
EU prediction results

OS Speedup

Overall

Philip Wells - HPCA 2008

also in the paper
Also in the paper
  • More characterization & results
    • Useless TLB writes
    • EU prediction accuracy
    • Large window processor
  • Two other ‘baseline’ implementations
    • Scoreboard
    • LateQuash
  • Discussion of SIs in real implementations:
    • Pentium M, Alpha 21264, PowerPC 750, UltraSPARC IIICu

Philip Wells - HPCA 2008

summary
Summary
  • Present first analysis of serializing instructions
    • Frequent across three ISAs
    • Limit OoO parallelism in OS code
    • Rival impact of L2 misses (8-45% for OS)
    • Many SI writes are effectively useless (EU)
  • Propose EU prediction
    • Predict writers and consumers can execute OoO
      • May read stale value, but execute properly anyway
    • 6-35% OS improvement (2-12% overall)
    • Not a panacea, but simple and works fairly well

Philip Wells - HPCA 2008

thank you
Thank you!

Questions, comments:

[email protected]

http://www.cs.wisc.edu/~pwells

Philip Wells - HPCA 2008

other si implementations
Other SI implementations
  • Reminder:
    • Baseline blocks all younger instructions after SI
  • Technique 1: “Scoreboard”
    • Track outstanding SI writes (similar to OWT)
    • Determine which stage to block consumers
    • Identify independent instructions
  • Technique 2: “LateSquash”
    • Instructions following SI enter pipeline, execute OoO
    • Squashed just before SI executes

Philip Wells - HPCA 2008

eu prediction results1
EU prediction results

OS Speedup

Overall

Philip Wells - HPCA 2008

why not value prediction
Why not value prediction?
  • Last value prediction for non-renamed registers
    • Can be modified to accurately predict many values
    • Can avoid serializing all non-renamed regs (not just EU)
    • Requires predicted value to be sent to every stage where it might be used
      • Avoiding this is the reason SIs exist in the first place

Philip Wells - HPCA 2008

explicit vs implicit consumers
Explicit vs. implicit consumers
  • Explicit consumers
    • Name their operands & use them at execute stage
  • Implicit consumers
    • Don’t name them & use values at a variety of pipeline stages
    • Are the reason writes to non-renamed regs serialize

rdpr %pstate, %o5

andn %o5, 2, %o4

wrpr %o4, 0, %pstate

brnz %o1, 0x5ca8

sethi %hi(0x140), %o3

Explicit consumer of %pstate

SI

Implicit consumers of %pstate

Philip Wells - HPCA 2008

ad