Download
1 / 24

Instruction Set Issues - PowerPoint PPT Presentation


  • 100 Views
  • Uploaded on

Instruction Set Issues. MIPS easy Instructions are only committed at MEM  WB transition Other architectures are more difficult Instructions may update state early FP more difficult Memory updating ops (e.g. string moves). Instruction Set Issues (cont.). Difficult architectural features

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Instruction Set Issues' - trella


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Instruction set issues
Instruction Set Issues

  • MIPS easy

    • Instructions are only committed at MEMWB transition

  • Other architectures are more difficult

    • Instructions may update state early

    • FP more difficult

    • Memory updating ops (e.g. string moves)


Instruction set issues cont
Instruction Set Issues (cont.)

  • Difficult architectural features

    • “Odd” bits of state (e.g. condition codes)

      • May need saving/restoring on exceptions

    • Implicitly set condition codes

      • Complicate branch resolution

      • Explicit setting helps here (still a RAW hazard)

    • Multicycle operations

      • Widely differing execution times, lots of potential data hazards, etc.


Instruction set issues1
Instruction Set Issues

  • VAX suffers from many of these problems

  • Solution: pipeline the microcode

  • Intel 32-bit 80x86 processors since 1995 use a similar approach


A 5 handling multicycle operations
A.5. Handling Multicycle Operations

  • MIPS: FP operations

    • Long latency (EX repeated)

    • Several functional units

    • Structural hazards

    • Data hazards


Dlx fp design
DLX: FP Design

  • Four functional units:

    • Integer ALU

      • as before

    • FP multiplier

      • also used for integer multiplication

    • FP adder

      • addition, subtraction and conversion

    • FP divider

      • also used for integer division




Hazards
Hazards

  • Divides

    • Structural hazard

  • Multiple register writes possible in a cycle

  • Out-of-order completion

    • WAW hazards

    • Exception-handling complications

  • RAW hazards increase


Potential raw hazards
Potential RAW Hazards

  • Example (SPARC syntax):

ldd [%fp-8], %f4

fmuld %f4, %f6, %f0

faddd %f0, %f8, %f2

std %f2, [%fp-16]


Multiple writes

Simpler: all stalls

at one point

Multiple Writes

  • Up to four instructions may need to write in the same cycle

  • Solution

    • Track writes in ID

    • Stall at instruction issue

  • Alternatively:

    • Stall at MEM or WB

      • Stall instruction with shorter latency (may free RAW hazards)


Waw hazards
WAW Hazards

  • Example:

faddd %f4, %f6, %f2

… ! Integer op

ldd [%fp-8], %f2


Waw hazards cont
WAW Hazards (cont.)

  • Rare

    • Compiler scheduling may result in unlikely instruction sequences, so must be caught

  • Solutions:

    • Stall issue of ldd

    • Prevent write by faddd


Maintaining precise exceptions

Complete long

before fdivd

Maintaining Precise Exceptions

  • Out-of-order completion:

fdivd %f2, %f4, %f0

faddd %f10, %f8, %f10

fsubd %f12, %f14, %f12

  • Sub may cause an exception after add is complete, but not div

    • No longer precise


Maintaining precise exceptions1
Maintaining Precise Exceptions

  • It may be very difficult to handle exceptions precisely

    • E.g. the add has destroyed one of its operands!

  • Four solutions:

    • Accept imprecise exceptions

      • Needed for VM & IEEE FP

      • Allow switching between precise and imprecise modes


Maintaining precise exceptions2
Maintaining Precise Exceptions

  • Solutions (cont.)

    • Buffer results until earlier instructions complete

      • Buffers may grow very large, and extensive forwarding required

      • History files: restore original register values

      • Future files: store new register values

    • Software executes intervening instructions to get “up to date” before returning from exception


Maintaining precise exceptions3
Maintaining Precise Exceptions

  • Solutions (cont.)

    • Hybrid scheme

      • Instructions are only issued when it is certain that preceding instructions will not cause an exception

      • May require stalling the pipeline


Performance of the mips fp pipeline
Performance of the MIPS FP Pipeline

  • Structural Hazards (divide unit)

    • Very low: 0-2 cycles per FP operation

  • RAW hazards

    • Divide: 12-24 cycles, average 14.2

    • Add: 0.7-2.3 cycles, average 1.7

    • In general, about 0.5 × latency


Overall mips fp performance
Overall MIPS FP Performance

  • Stalls per instruction

    • 0.65-1.21 cycles

    • Average: 0.87

    • 82% from FP RAW hazards


A 6 putting it all together mips r4000 pipeline
A.6. Putting It All TogetherMIPS R4000 Pipeline

  • 64-bit instruction set

  • Eight stage pipeline

    • superpipelining

    • IF + IS: instruction fetch

    • RF: decode/register fetch

    • EX: execution

    • DF + DS + TC: data cache access

    • WB: write back


Mips r4000 pipeline
MIPS R4000 Pipeline

  • Performance

    • Load delay: two cycles

    • Branch delay: three cycles

      • Delayed branch (one cycle)

      • Predict-not-taken strategy, with anulling

  • Increased forwarding requirements

    • Three stages between EX and WB now


Mips r4000 pipeline1
MIPS R4000 Pipeline

  • Floating Point

    • Three functional units

      • Divider, multiplier, adder

      • Shared components (8 sub-units)

    • Latency: 2–112 cycles

    • Initiation rate: 1–111 cycles

    • Complicated stall handling


Mips r4000 pipeline2
MIPS R4000 Pipeline

  • Performance:

    • CPI between 1.2 and 2.8 for SPEC92 benchmarks

    • Average: 2.0

      • Integer: 1.54

      • FP: 2.48

    • Integer apps: mainly branch delays

    • FP apps: mainly FP data hazard stalls (RAW)


ad