1 / 26

Review

Review. Aspects of Performance. Clock ( clk ) cycle Time of one clock period Generally constant for a processor Instruction count (IC) Number of instructions to be executed for a program Different instructions may consume different number of cycles

tierra
Download Presentation

Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review

  2. Aspects of Performance • Clock (clk) cycle • Time of one clock period • Generally constant for a processor • Instruction count (IC) • Number of instructions to be executed for a program • Different instructions may consume different number of cycles • Different for different programs, compilers and compilation • Cycles per instruction (CPI) • Average CPU clock cycles for a program • Execution Time

  3. CPI • CPI = cycles per instruction • CPI provides one way of comparing two different implementations of the same instruction set architecture • CPI is tricky! • Different instructions require cycles depending on what they do • Depends on program • Memory behavior affects CPI

  4. CPI for program may not be available • Given • CPI for individual instruction • IC for individual instructions • Profiling a program • Simulation of architecture

  5. Example • Which one is faster? • C1= 1 *2 + 2*1 + 3*2 = 10 • C2 = 1*4 + 2*1 + 3*1 = 9 • C2 < C1 (Less clock cycles) hence C2 is faster.

  6. Amdahl’s Law • Basic idea: improve the common case • Improvement by the faster mode is limited by the fraction of time the faster mode can be used

  7. If we make division run 3 times faster and multiplication run 8 times faster what is the overall speedup? • We want to make the machine run 4 times faster. Can we achieve this goal just by making one change- either multiplication or division?

  8. Instruction Set • The repertoire of instructions of a computer • Different computers have different instruction sets • But with many aspects in common • Early computers had very simple instruction sets • Simplified implementation • Many modern computers also have simple instruction sets

  9. R-Type Instruction • Register type: Operates on 3 registers • op: operation • rs: first source operand • rt: second source operand • rd: destination operand • shamt: shift amount- used only in shift operations • funct: selects specific variant of the opcode • Syntax : <op> $rd, $rs, $rt • add $t0, $s1, $s2

  10. I-Type Instruction • Immediate Type: Operate on 2 registers • rs, imm always as source • rs- flexible • Syntax • <op> $rt, $rs, imm • <op> $rt, offset($rs) • addi $s0, $s1, 5 • sw $s1, 4($t1)

  11. Exercise • Reverse engineer the instruction: 0xAD310004 (corresponding assembly code???)

  12. Solution 0xAD310004 sw $s1, 4($t1)

  13. Computer Arithmetic

  14. IEEE 754 FP Format single: 8 bitsdouble: 11 bits single: 23 bitsdouble: 52 bits • S: sign bit (0  non-negative, 1  negative) • Normalized significand • Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) • Exponent: Actual exponent + Bias • Ensures exponent is unsigned • Single: Bias = 127; Double: Bias = 1023 S Exponent Fraction

  15. Decimal to FP Conversion • Represent –0.75 • –0.75 = (–1)1 × 1.12 × 2–1 • S = 1 • Fraction = 1000…002 • Exponent = –1 + Bias • Single: –1 + 127 = 126 = 011111102 • Double: –1 + 1023 = 1022 = 011111111102 • Single: 1011111101000…00 • Double: 1011111111101000…00

  16. Datapath- processor design

  17. Datapath with Control

  18. Control Signals • RegWrite: Whether a register is to be written to • ALUSrc: Decides second ALU operand • ALUOp: Which operation in ALU • PCSrc: Determines the program counter • MemWrite: Whether memory is to be written • MemRead: Whether memory is to be read • MemToReg: Register write data from memory or ALU

  19. 0 4 35 or 43 rs rs rs rt rt rt rd address address shamt funct 31:26 31:26 31:26 25:21 25:21 25:21 20:16 20:16 20:16 15:11 10:6 15:0 15:0 5:0 Instruction Format Decide Control Unit • Control signals derived from instruction R-type Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add

  20. Control Signals

  21. Pipelined Datapath

  22. Pipeline: MIPS Instructions • Steps • IF: Fetch instruction from memory • ID: Decode the instruction/ Read the registers • Ex: Execute instruction or calculate address • Mem: Access operand in data memory • WB: Write result into register

  23. Pipeline Speedup • Execution time of an instruction not affected • Speedup! • Ideal Case • All stages take equal time • Speedup = Number of stages in the pipeline • Increases throughput by overlapping the instructions. Different instruction use different resources. Number of instructions executed per unit of time increased. • Not Always: Some stage may be longer

  24. Pipelined Control

  25. Pipelined Control Stages • Need to set control lines • IF: Control signals to read instruction memory and to PC are always asserted • ID: Same thing happens every clock cycle • EX: RegDst, ALUOp, ALUSrc • MEM: Branch, MemRead, MemWrite • WB: MemToReg, RegWrite

  26. Pipeline Hazards • Hazard: Next instruction cannot be executed in the following cycle • Classification: • Structural: Two instructions use same resource • Separate instruction and data memory to resolve • Data: Destination register in current instruction used as source in next • Stall, forwarding, instruction reordering • Control: Due to branch instructions • Stall, branch prediction, branch delay slot

More Related