Compsys 304
This presentation is the property of its rightful owner.
Sponsored Links
1 / 24

COMPSYS 304 PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

COMPSYS 304. Computer Architecture Speculation & Branching. Morning visitors - Paradise Bay, Bay of Islands. Speculation. High Tech Gambling? Data Prefetch Cache instruction dcbt : data cache block touch Attempts to bring data into cache so that it will be “close” when needed

Download Presentation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Compsys 304


Computer Architecture

Speculation & Branching

Morning visitors - Paradise Bay, Bay of Islands



  • High Tech Gambling?

  • Data Prefetch

    • Cache instruction

      • dcbt : data cache block touch

    • Attempts to bring data into cache

      • so that it will be “close” when needed

    • Allows SIU to use idle bus bandwidth

      • if there’s no spare bandwidth, this read can be given low priority

    • Speculative because

      • a branch may occur before it’s used

      • we speculate that this data may be needed

PowerPC mnemonic -

Similar opcodes found in other architectures:SPARC v9, MIPS, …

Speculation general

Speculation - General

  • Some functional units almost always idle

    • Make them do some (possibly useful) workrather than idle

    • If the speculation was incorrect, results are simply abandoned

    • No loss in efficiency; Chance of a gain

  • Researchers are actively looking at software prefetch schemes

    • Fetch data well before it’s needed

    • Reduce latency when it’s actually needed

  • Speculative operations have low priority and use idle resources



  • Expensive

    • 2-3 cycles lost in pipeline

      • All instructions following branch ‘flushed’

    • Bandwidth wasted fetching unused instructions

    • Stall while branch target is fetched

  • We can speculate about the target of a branch

  • Terminology

    • Branch Target : address to which branch jumps

    • Branch Taken : control transfers to non- sequential address (target)

    • Branch Not Taken : next instruction is executed

Branching prediction

Branching - Prediction

  • Branches can be

    • unconditional: branch is always takencall subroutine return from subroutine

    • conditional: branch depends on state of computation, eghas loop terminated yet?

  • Unconditional branches are simple

    • New instructions are fetched as soon as the branch is recognized

    • As early in the pipeline as possible

  • Branch units often placed with fetch & decode stages

Branching branch unit

Branching - Branch Unit

  • PowerPC 603 logical layout

Branching speculation

Branching - Speculation

  • We have the following code:

  • if ( cond ) s1; else s2;

  • Superscalar machine

    • Multiple functional units

    • Start executing both branches (s1 and s2)

    • Keep idle functional units busy!

  • One is speculative and will be abandoned

    • Processor will eventually calculate the branch condition and select which result should be retained (written back)

  • MIPS R10000 - up to 4 speculative at once

Branching speculation1

Branching - Speculation

  • MIPS R10000 -

    • Up to 4 speculative at once

    • Instructions are “tagged” with a 4 bit mask

      • Indicates to which branch instruction it belongs

    • As soon as condition is determined,mis-predicted instructions are aborted

Branching prediction1

Branching - Prediction

  • We have a sequence of instructions:

    • addlw

    • sub

    • brne L1

    • orst

  • If you were asked to guess which branch should be preferred,

  • which would you choose:

    • Next sequential instruction (L2)

    • Branch target (L1)


Some mixture of arithmetic,

load, store, etc, instructions

branch on some condition

Some more arithmetic,

load, store, etc, instructions


Branching prediction2

Branching - Prediction

  • Studies show that backward branches are taken most of the time!

  • Because of loops:

    • add;any mix of arith,lw;load, store, etc,

    • sub;instructionsbrne L1 ;branch back to loop start

    • or ;some more arith,st ;memory, etc instructions



Branching prediction rule

Branching - Prediction Rule

  • A simple prediction rule:

    • Take backward branches

  • works amazingly well!

  • For a loop with n iterations,this is wrong in 1/n cases only!

  • A system working on this rule alone would

    • detect the backward branch and

    • start fetching from the branch targetrather than the next instruction

Branching improving the prediction

Branching - Improving the prediction

  • Static prediction systems

    • Compiler can mark branches

      • Likely to be taken or not

    • Instruction fetch unit will use the marking as advice on which instruction to fetch

  • Compiler often able to give the right advice

    • Loops are easily detected

    • Other patterns in conditions can be recognized

      • Checking for EOF when reading a file

      • Error checking

Branching improving the prediction1

Branching - Improving the prediction

  • Dynamic prediction systems

    • Program history determines most likely branch

    • Branch Target Buffers - Another cache!

Branching branch target buffer

Branching - Branch Target Buffer

  • Instruction Add[11:3] selects BTB entry

  • Tag determines “hit”

  • Stats select taken/not taken

Pentium 4

>91% prediction

accuracy -

4K entry BHT

(Branch History Table)

G4e – 2K entries

Branching branch target buffer1

Branching - Branch Target Buffer

  • BTB – just another cache

    • Works on temporal locality principle

      • If this branch is taken (not taken) now, it’s likely to be taken (not taken) next time

      • Replace on conflicts (newest is best)

    • Any cache organization could be used

      • Direct mapped, associative, set-associative

      • No write-back needed

      • Flushed entries are restored

  • Major difference from other caches

    • Status bits …………

Branching branch target buffer2

Branching - Branch Target Buffer

  • Status bits

    • Provide hysteresis in behaviour

    • Without hysteresis, behaviour change would cause the prediction to immediately update

      • Example:

        • If ( cond ) s1else s2

      • If the program takes branch s1 a few times,the BTB will predict that s1 is more likely than s2

      • If s2 is then taken, usual cache behaviour suggests that the prediction should be updated to s2


    • Program branching behaviour is a little different ….

Branching branch target buffer3

Branching - Branch Target Buffer

  • Status bits

    • Common branch behaviour is like this

      • List of taken branches:

        s1 s1 s1 s1 s1 s2 s1 s1 s1 s2 s1 …

      • Usually s1 is executed,occasionally s2


      • s2 handles errors

      • s2 follows a loop

    • ‘Standard’ cache update policies (assume the most recent will used next) would update the prediction from s1 to s2 immediately

      • This would cause many mis-predictions

Branching branch target buffer4

Branching - Branch Target Buffer

  • Status bits

    • However, if the BTB waits until it has seen s2 a number of times before changing the prediction, the previous stream is predicted well

    • So the status bits (say 2 bits) are a count of the number of correct predictions

      • A correct prediction updates the count (maybe saturating at 2 – ie counts to max 2)

      • A mis-prediction decrements the count

      • A mis-prediction and count=0 updates the prediction

      • This accommodates an occasional break from a pattern (eg s1 is usually taken) without disturbing the best prediction (take s1)

      • It also handles situations where behaviour changes sometimes

Branching branch target buffer5

Branching - Branch Target Buffer

  • Status bits - Count correct predictions

    • Handles situations where behaviour changes sometimes

      • Programs which move from one ‘region’ to another ..


    • Image processing code - looking for an orange object

      • Process background (non-orange) pixels,

      • finds the orange thing,

      • counts orange pixels for a while, then

      • reverts back to background

// search for orange object in row of pixelsfor(j=0;j<width;j++) { if ( pixel[j].colour != orange ) // s1


else { // s2 o_cnt++; if ( o_cnt > obj_width ) … // found it!



Branching branch target buffer6

Branching - Branch Target Buffer

  • Status bits

    • Count correct predictions

    • Handles situations where behaviour changes sometimes

      • Programs which move from one ‘region’ to another ..

    • Example:

      • Image processing program looking for an orange object

        • Process background (non-orange) pixels,

        • finds the orange thing,

        • counts orange pixels for a while, then

        • reverts back to background

      • List of taken branches:

        Taken branches: s1 s1 s1 s2 s2 s2 … s2 s1 s1 s1 s1

        Region: BG BG BG OR OR OR … OR BG BG BG BG

        Prediction: s1 s1 s1 s1 s1 s2 … s2 s2 s2 s1 s1

        Correct:  … 

Branching branch target buffer7

Branching - Branch Target Buffer

  • Status bits

    • Count correct predictions

    • Reasonable compromise behaviour for most situations

      • Tolerates an occasional ‘error’ branch well

      • Changes to a new behaviour with a small delay

    • Typically about 90% correct predictions

    • BTB with 2k – 4k entries

Speculation branching summary

Speculation & Branching - Summary

  • Data speculation

    • Try to bring data ‘closer’ to CPU (ie into cache) before needed

      • Reduce memory access latency

    • Techniques

      • Special ‘touch’ instructions

        • Advice to processor – fetch if resources available

      • Software

        • eg Dummy reference

  • Instruction (Branch) speculation ..

Speculation branching summary1

Speculation & Branching - Summary

  • Branches are expensive!!

  • Instruction (Branch) speculation

    • Execute both branches of a conditional branch

    • ‘Squash’ (abandon) results from wrong branchwhen branch condition eventually evaluated

    • Compiler can also mark most probable branch

  • Branch prediction

    • Simplest rule: take backward branches

    • Branch Target Buffer

      • Cache containing most recent branch target

      • ‘Standard’ cache, except for

      • Status bits

        • Introduce hysteresis into behaviour

        • Only update branch target when it’s definitely the right choice

Superscalar summary

Superscalar - summary

  • Superscalar machines have multiple functional units (FUs)

    eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store

  • Requires complex IFU

    • Able to issue multiple instructions/cycle (typ 4)

    • Able to detect hazards (unavailability of operands)

    • Able to re-order instruction issue

      • Aim to keep all the FUs busy

  • Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3

  • Login