1 / 25

Accelerating Performance

The RISC Revolution. Accelerating Performance. Regular. CISC  RISC. CISC: Complex Instruction Set Architecture Complex decoders Lots of Circuitry Some Complex instructions may never be used RISC: Reduced Instruction Set Architecture Better use of silicon real state. Instruction Usage.

kai-hunt
Download Presentation

Accelerating Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The RISC Revolution Accelerating Performance CMPUT 229

  2. CMPUT 229 Regular CISC  RISC • CISC: Complex Instruction Set Architecture • Complex decoders • Lots of Circuitry • Some Complex instructions may never be used • RISC: Reduced Instruction Set Architecture • Better use of silicon real state.

  3. CMPUT 229 Instruction Usage • Fairclough* divided instructions into eight groups: • Data movement • Program modification (branch, call, return) • Arithmetic • Compare • Logical • Shift • Bit manipulation • Input/output and miscellaneous * Fairclough, D. A., “A Unique Microprcessor Instruction Set,” IEEE Micro, May, 1982, pp. 8-18. Clements, pp. 328

  4. CMPUT 229 Constants, parameters, and local storage • Tanenbaum* reported that: • 56% of all constant values are in the -15 to +15 range • 98% of all constant values are in the -511 to +511 range • Thus a 5-bit immediate field covers more than half of the literals • Other researchers showed that • 95% of subroutines require 12 words or less for parameter passing and local storage • Thus providing this space in the processor reduces processor-memory bus traffic. * Tanenbaum, Andrew S., “Implications of Structured Programming for Machine Architecture,” Communications of the ACM, Vol. 21, N. 3, March 1978, pp. 237-246 Clements, pp. 329

  5. CMPUT 229 RISC Characteristics • Enough registers to reduce memory traffic • Instructions operate on three registers • Efficient parameter passing and branching • Don’t implement infrequent (complex) instructions • Aim to execute one instruction per cycle • Fix instruction length Clements, pp. 329

  6. CMPUT 229 Register Windows • A window is a set of registers visible to the current subroutine • A Window Pointer (WP) register indicate the current active window • In the Berkeley RISC each window has 32 registers. • A call to a subroutine in the Berkeley RISC used the intruct.: CALLR Rd, address • The current value of the PC is written into the register Rd of the new window. Clements, pp. 330

  7. CMPUT 229 Berkeley RISC Register Window Clements, pp. 332

  8. CMPUT 229 Berkeley RISC Register Window Clements, pp. 333

  9. CMPUT 229 RISC Pipeline Clements, pp. 335

  10. CMPUT 229 Instruction Overlapping in a RISC Pipeline Clements, pp. 336

  11. CMPUT 229 Instruction Overlapping in a RISC Pipeline Clements, pp. 336

  12. CMPUT 229 Pipeline Hazards • Cause a stall in the pipeline • Branch instructions • We don’t know which instruction to execute next • Data Dependences • We don’t know what is the value of an operand

  13. CMPUT 229 A Bubble in the Pipeline Clements, pp. 337

  14. CMPUT 229 Delayed Branch Clements, pp. 338

  15. CMPUT 229 Data Dependency ADD R1, R2, R3 [R1]  [R2] + [R3] ADD R5, R2, R4 [R5]  [R2] + [R4] ADD R6, R7, R5 [R6]  [R7] + [R5] ADD R2, R2, R4 [R2]  [R2] + [R4] Clements, pp. 338

  16. CMPUT 229 Data Dependency ADD R1, R2, R3 [R1]  [R2] + [R3] ADD R5, R2, R4 [R5]  [R2] + [R4] ADD R6, R7, R5[R6]  [R7] + [R5] ADD R2, R2, R4 [R2]  [R3] + [R4] Clements, pp. 338

  17. CMPUT 229 Bubble Because of Data Dependency Clements, pp. 338

  18. CMPUT 229 Internal Forwarding Clements, pp. 339

  19. CMPUT 229 A Probabilistic Model for Branch Penalty • Assumptions: • Non-branch instructions execute in one cycle • pb: probability that an instruction is a branch • pt: probability that a branch instruction is taken • b: additional cycles required if the branch is taken • There is no penalty if a branch is not taken • Tave: average time to execute an instruction Tave = (1 - pb)1 + pb(ptb + 1) Tave = 1 - pb + pbptb + pb Tave = (1 - pb)NonBranchTime + pbBranchTime Tave = 1 + pbptb BranchTime = ptTimeTaken + (1-pt)TimeNotTaken = pt(1+b) + (1-pt)1 = pt+ptb + 1 - pt = ptb + 1 Clements, pp. 339

  20. CMPUT 229 Branch Prediction • Idea: Guess which way a branch will go and start fetching instructions from the right place. pb: probability instruction is a branch pt: probability taken pt: probability prediction is correct a,b,c,d: penalties in each case

  21. CMPUT 229 Average Branch Penalty • The average branch penalty is given by Cave = a.(pt.pc) +

  22. CMPUT 229 Average Branch Penalty • The average branch penalty is given by Cave = a.(pt.pc) + b.(1-pt).(1-pc)

  23. CMPUT 229 Average Branch Penalty • The average branch penalty is given by Cave = a.(pt.pc) + b.(1-pt).(1-pc) + c.pt.(1-pc)

  24. CMPUT 229 Average Branch Penalty • The average branch penalty is given by Cave = a.(pt.pc) + b.(1-pt).(1-pc) + c.(1-pt).(1-pc) + d.(1-pt).pc

  25. CMPUT 229 Approaches to Branch Prediction • Static Branch Prediction: • A given branch is predicted to be either always taken or never taken • Dynamic Branch Prediction: • Use the past behavior of the program to predict a branch • Processor maintain a branch prediction table • Single bit predictors ==> accuracy of 80% • Five bit predictors ==> accuracy of 98%

More Related