compsys 304
Download
Skip this Video
Download Presentation
COMPSYS 304

Loading in 2 Seconds...

play fullscreen
1 / 24

COMPSYS 304 - PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on

COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' COMPSYS 304' - nyoko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
compsys 304

COMPSYS 304

Computer Architecture

Speculation & Branching

Morning visitors - Paradise Bay, Bay of Islands

speculation
Speculation
  • High Tech Gambling?
  • Data Prefetch
    • Cache instruction
      • dcbt : data cache block touch
    • Attempts to bring data into cache
      • so that it will be “close” when needed
    • Allows SIU to use idle bus bandwidth
      • if there’s no spare bandwidth, this read can be given low priority
    • Speculative because
      • a branch may occur before it’s used
      • we speculate that this data may be needed

PowerPC mnemonic -

Similar opcodes found in other architectures:SPARC v9, MIPS, …

speculation general
Speculation - General
  • Some functional units almost always idle
    • Make them do some (possibly useful) workrather than idle
    • If the speculation was incorrect, results are simply abandoned
    • No loss in efficiency; Chance of a gain
  • Researchers are actively looking at software prefetch schemes
    • Fetch data well before it’s needed
    • Reduce latency when it’s actually needed
  • Speculative operations have low priority and use idle resources
branching
Branching
  • Expensive
    • 2-3 cycles lost in pipeline
      • All instructions following branch ‘flushed’
    • Bandwidth wasted fetching unused instructions
    • Stall while branch target is fetched
  • We can speculate about the target of a branch
  • Terminology
    • Branch Target : address to which branch jumps
    • Branch Taken : control transfers to non- sequential address (target)
    • Branch Not Taken : next instruction is executed
branching prediction
Branching - Prediction
  • Branches can be
    • unconditional: branch is always takencall subroutine return from subroutine
    • conditional: branch depends on state of computation, eg has loop terminated yet?
  • Unconditional branches are simple
    • New instructions are fetched as soon as the branch is recognized
    • As early in the pipeline as possible
  • Branch units often placed with fetch & decode stages
branching branch unit
Branching - Branch Unit
  • PowerPC 603 logical layout
branching speculation
Branching - Speculation
  • We have the following code:
  • if ( cond ) s1; else s2;
  • Superscalar machine
    • Multiple functional units
    • Start executing both branches (s1 and s2)
    • Keep idle functional units busy!
  • One is speculative and will be abandoned
    • Processor will eventually calculate the branch condition and select which result should be retained (written back)
  • MIPS R10000 - up to 4 speculative at once
branching speculation1
Branching - Speculation
  • MIPS R10000 -
    • Up to 4 speculative at once
    • Instructions are “tagged” with a 4 bit mask
      • Indicates to which branch instruction it belongs
    • As soon as condition is determined,mis-predicted instructions are aborted
branching prediction1
Branching - Prediction
  • We have a sequence of instructions:
    • addlw
    • sub
    • brne L1
    • orst
  • If you were asked to guess which branch should be preferred,
  • which would you choose:
    • Next sequential instruction (L2)
    • Branch target (L1)

L1

Some mixture of arithmetic,

load, store, etc, instructions

branch on some condition

Some more arithmetic,

load, store, etc, instructions

L2

branching prediction2
Branching - Prediction
  • Studies show that backward branches are taken most of the time!
  • Because of loops:
    • add ;any mix of arith,lw ;load, store, etc,
    • sub ;instructionsbrne L1 ;branch back to loop start
    • or ;some more arith,st ;memory, etc instructions

L1

L2

branching prediction rule
Branching - Prediction Rule
  • A simple prediction rule:
    • Take backward branches
  • works amazingly well!
  • For a loop with n iterations,this is wrong in 1/n cases only!
  • A system working on this rule alone would
    • detect the backward branch and
    • start fetching from the branch targetrather than the next instruction
branching improving the prediction
Branching - Improving the prediction
  • Static prediction systems
    • Compiler can mark branches
      • Likely to be taken or not
    • Instruction fetch unit will use the marking as advice on which instruction to fetch
  • Compiler often able to give the right advice
    • Loops are easily detected
    • Other patterns in conditions can be recognized
      • Checking for EOF when reading a file
      • Error checking
branching improving the prediction1
Branching - Improving the prediction
  • Dynamic prediction systems
    • Program history determines most likely branch
    • Branch Target Buffers - Another cache!
branching branch target buffer
Branching - Branch Target Buffer
  • Instruction Add[11:3] selects BTB entry
  • Tag determines “hit”
  • Stats select taken/not taken

Pentium 4

>91% prediction

accuracy -

4K entry BHT

(Branch History Table)

G4e – 2K entries

branching branch target buffer1
Branching - Branch Target Buffer
  • BTB – just another cache
    • Works on temporal locality principle
      • If this branch is taken (not taken) now, it’s likely to be taken (not taken) next time
      • Replace on conflicts (newest is best)
    • Any cache organization could be used
      • Direct mapped, associative, set-associative
      • No write-back needed
      • Flushed entries are restored
  • Major difference from other caches
    • Status bits …………
branching branch target buffer2
Branching - Branch Target Buffer
  • Status bits
    • Provide hysteresis in behaviour
    • Without hysteresis, behaviour change would cause the prediction to immediately update
      • Example:
        • If ( cond ) s1else s2
      • If the program takes branch s1 a few times,the BTB will predict that s1 is more likely than s2
      • If s2 is then taken, usual cache behaviour suggests that the prediction should be updated to s2

but

    • Program branching behaviour is a little different ….
branching branch target buffer3
Branching - Branch Target Buffer
  • Status bits
    • Common branch behaviour is like this
      • List of taken branches:

s1 s1 s1 s1 s1 s2 s1 s1 s1 s2 s1 …

      • Usually s1 is executed,occasionally s2

eg

      • s2 handles errors
      • s2 follows a loop
    • ‘Standard’ cache update policies (assume the most recent will used next) would update the prediction from s1 to s2 immediately
      • This would cause many mis-predictions
branching branch target buffer4
Branching - Branch Target Buffer
  • Status bits
    • However, if the BTB waits until it has seen s2 a number of times before changing the prediction, the previous stream is predicted well
    • So the status bits (say 2 bits) are a count of the number of correct predictions
      • A correct prediction updates the count (maybe saturating at 2 – ie counts to max 2)
      • A mis-prediction decrements the count
      • A mis-prediction and count=0 updates the prediction
      • This accommodates an occasional break from a pattern (eg s1 is usually taken) without disturbing the best prediction (take s1)
      • It also handles situations where behaviour changes sometimes
branching branch target buffer5
Branching - Branch Target Buffer
  • Status bits - Count correct predictions
    • Handles situations where behaviour changes sometimes
      • Programs which move from one ‘region’ to another ..

eg

    • Image processing code - looking for an orange object
        • Process background (non-orange) pixels,
        • finds the orange thing,
        • counts orange pixels for a while, then
        • reverts back to background

// search for orange object in row of pixelsfor(j=0;j<width;j++) { if ( pixel[j].colour != orange ) // s1

bg_cnt++;

else { // s2 o_cnt++; if ( o_cnt > obj_width ) … // found it!

}

}

branching branch target buffer6
Branching - Branch Target Buffer
  • Status bits
    • Count correct predictions
    • Handles situations where behaviour changes sometimes
      • Programs which move from one ‘region’ to another ..
    • Example:
      • Image processing program looking for an orange object
        • Process background (non-orange) pixels,
        • finds the orange thing,
        • counts orange pixels for a while, then
        • reverts back to background
      • List of taken branches:

Taken branches: s1 s1 s1 s2 s2 s2 … s2 s1 s1 s1 s1

Region: BG BG BG OR OR OR … OR BG BG BG BG

Prediction: s1 s1 s1 s1 s1 s2 … s2 s2 s2 s1 s1

Correct:  … 

branching branch target buffer7
Branching - Branch Target Buffer
  • Status bits
    • Count correct predictions
    • Reasonable compromise behaviour for most situations
      • Tolerates an occasional ‘error’ branch well
      • Changes to a new behaviour with a small delay
    • Typically about 90% correct predictions
    • BTB with 2k – 4k entries
speculation branching summary
Speculation & Branching - Summary
  • Data speculation
    • Try to bring data ‘closer’ to CPU (ie into cache) before needed
      • Reduce memory access latency
    • Techniques
      • Special ‘touch’ instructions
        • Advice to processor – fetch if resources available
      • Software
        • eg Dummy reference
  • Instruction (Branch) speculation ..
speculation branching summary1
Speculation & Branching - Summary
  • Branches are expensive!!
  • Instruction (Branch) speculation
    • Execute both branches of a conditional branch
    • ‘Squash’ (abandon) results from wrong branchwhen branch condition eventually evaluated
    • Compiler can also mark most probable branch
  • Branch prediction
    • Simplest rule: take backward branches
    • Branch Target Buffer
      • Cache containing most recent branch target
      • ‘Standard’ cache, except for
      • Status bits
        • Introduce hysteresis into behaviour
        • Only update branch target when it’s definitely the right choice
superscalar summary
Superscalar - summary
  • Superscalar machines have multiple functional units (FUs)

eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store

  • Requires complex IFU
    • Able to issue multiple instructions/cycle (typ 4)
    • Able to detect hazards (unavailability of operands)
    • Able to re-order instruction issue
      • Aim to keep all the FUs busy
  • Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3
ad