1 / 23

CS 230: Computer Organization and Assembly Language

CS 230: Computer Organization and Assembly Language. Aviral Shrivastava. Department of Computer Science and Engineering School of Computing and Informatics Arizona State University. Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB. Announcements.

marsha
Download Presentation

CS 230: Computer Organization and Assembly Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

  2. Announcements • Project 3 • MIPS Assembler • Project 4 • MIPS Simulator • Due Nov 10, 2009 • Quiz 4 • Nov 5, 2009 • Single-cycle implementation • Finals • Tuesday, Dec 08, 2009 • Please come on time (You’ll need all the time) • Open book, notes, and internet • No communication with any other human

  3. Single Cycle - Abstract View • Abstract View • elements that operate on data values (combinational) • elements that contain state (sequential) • Implementation • Design the datapath • Design the control Write Data Instruction Memory Address Read Data Register File Reg Addr Data Memory Read Data PC Address Instruction ALU Reg Addr Read Data Write Data Reg Addr

  4. Single cycle Datapath RegWrite ALUSrc ALU control MemWrite MemtoReg ovf zero Read Addr 1 Read Data 1 Address Register File Read Addr 2 Data Memory Read Data ALU Write Addr Read Data 2 Write Data Write Data MemRead Sign Extend 16 32 Jump 32 26 1 Shift left 2 28 PC+4[31-28] 0 Add Add 4 Shift left 2 PCSrc Instruction Memory Read Address PC Instruction

  5. Single cycle Datapath + Control Instr[25-0] 1 Shift left 2 28 32 26 0 PC+4[31-28] 0 Add Add 1 4 Shift left 2 PCSrc Jump ALUOp Branch MemRead Instr[31-26] Control Unit MemtoReg MemWrite ALUSrc RegWrite RegDst ovf Instr[25-21] Read Addr 1 Instruction Memory Read Data 1 Address Register File Instr[20-16] zero Read Addr 2 Data Memory Read Address PC Instr[31-0] 0 Read Data 1 ALU Write Addr Read Data 2 0 1 Write Data 0 Instr[15 -11] Write Data 1 Instr[15-0] Sign Extend ALU control 16 32 Instr[5-0]

  6. Single cycle Control Unit • Completely determined by the instruction opcode field • Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation

  7. Disadvantages of Single Cycle Implementation • Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction • especially problematic for more complex instructions like floating point multiply • Is wasteful of area since some functional units must be duplicated since they can not be “shared” during an instruction execution • e.g., need separate adders to do PC update and branch target address calculations, as well as an ALU to do R-type arithmetic/logic operations and data memory address calculations

  8. How to make it fast? • Parallelism • Short-cuts or Caching, or Bypassing • Prediction • Skip some work • First form of parallelism is Pipelining

  9. Pipelining: Its Natural! A B C D • Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold • Washer takes 30 minutes • Dryer takes 40 minutes • “Folder” takes 20 minutes

  10. Sequential Laundry A B C D 6 PM Midnight 7 8 9 11 10 Time • Sequential laundry takes 6 hours for 4 loads 30 40 20 30 40 20 30 40 20 30 40 20 T a s k O r d e r

  11. Pipelined Laundry 30 40 40 40 40 20 A B C D 6 PM Midnight 7 8 9 11 10 Time Pipelined laundry takes 3.5 hours for 4 loads T a s k O r d e r Note: More time to do project 4

  12. Pipelining Lessons 30 40 40 40 40 20 A B C D 6 PM 7 8 9 Time • Multiple tasks operating simultaneously • Pipelining doesn’t help latency of single task, it helps throughput of entire workload • Pipeline rate limited by slowest pipeline stage • Potential speedup = Number pipe stages • Unbalanced lengths of pipe stages reduces speedup • Also, need time to “fill” and “drain” the pipeline. T a s k O r d e r

  13. Pipelining: Some terms • If you’re doing laundry or implementing a mP, each stage where something is done called a pipe stage • In laundry example, washer, dryer, and folding table are pipe stages; clothes enter at one end, exit other • In a mP, instructions enter at one end and have been executed when they leave • Another example: auto assembly line • Throughput is how often stuff comes out of a pipeline

  14. Technical details • If times for all S stages are equal to T: • Time for one initiation to complete still ST • Time between 2 initiates = T not ST • Initiations per second = 1/T • Pipelining: Overlap multiple executions of same sequence • Improves THROUGHPUT, not the time to perform a single operation • Other examples: • Automobile assembly plant, chemical factory, garden hose, cooking

  15. More technical details • Book’s approach to draw pipeline timing diagrams… • Time runs left-to-right, in units of stage time • Each “row” below corresponds to distinct initiation • Boundary b/t 2 column entries: pipeline register • (i.e. hamper) • Must look at column contents to see what stage is doing what Time for N initiations to complete: NT + (S-1)T Throughput: Time per initiation = T + (S-1)T/N  T!

  16. Ideal pipeline speedup Unpipelined combinational logic delay = t combinational logic delay = t combinational logic delay = t combinational logic delay = t delay for 1 piece of data = 4t + latch setup (assume small) Latch Latch approximate delay for 1000 pieces of data = 4000t Pipelined combinational logic delay = t combinational logic delay = t combinational logic delay = t combinational logic delay = t Latch Latch delay for 1 piece of data = 4(t + latch setup) approximate delay for 1000 pieces of data = 3t + 1000t 4000 = ~ 4 speedup for 1000 pieces of data = 1003 Ideal speedup = # of pipeline stages

  17. The “new look” dataflow IF/ID ID/EX EX/MEM MEM/WB 4 M u x ADD PC Branch taken Comp. IR6...10 M u x Inst. Memory IR11..15 MEM/ WB.IR Register File ALU M u x Data Mem. Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution. M u x Sign Extend 16 32

  18. Another way to look at it… ALU ALU ALU ALU IM IM IM IM Reg Reg Reg Reg DM DM DM DM Reg Reg Reg Reg Clock Number Time Program execution order (in instructions)

  19. Questions about control signals • Following discussion relevant to a single instruction • Q: Are all control signals active at the same time? • Q: Can we generate all these signals at the same time?

  20. Passing control w/pipe registers strip off signals for execution phase strip off signals for WB memory phase n strip off signals for o M Control WB i t write-back phase c Genera- u r tion t s EX M WB n I RegDst Branch MemtoReg ALUOp MemRead RegWrite ALUSrc MemWrite IF/ID ID/EX EX/MEM MEM/WB • Analogy: send instruction with car on assembly line • “Install Corinthian leather interior on car 6 @ stage 3”

  21. Pipelined datapath w/control signals PCSrc ID/EX M u WB EX/MEM x M MEM/WB Control WB ALUOp EX M WB IF/ID e d ALUSrc t h a i r g c e W n e R a R m m r o e B e e t t Add i m M r M W e M g Add e Shift R left 2 Read t s reg 1 D Read PC g e address Read R Read data 1 Read reg 2 Zero addr Read Write Read M data Instruction reg data 2 u ALU Write M Memory x u addr Write x data Write Data data Memory Inst[15-0] Sign ALU extend control Inst[20-16] M u Inst[15-11] x Registers

  22. A Pipelined Processor Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Dec/Reg • Exec Mem • Wr Ifetch Dec/Reg • Exec Mem • Wr Ifetch Dec/Reg • Exec Mem • Wr pipelined Ifetch Dec/Reg • Exec Mem • Wr • Pipeline latches: pass the status and result of the current instruction to next stage • Comparison: Clock Dec/Reg • Exec Single-cycle Ifetch Mem Ifetch sw lw

  23. Yoda says… Ohhh. Great warrior. Wars not make one great

More Related