Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
2014-3-13 John Lazzaro (not a prof - “John” is always OK) PowerPoint Presentation
Download Presentation
2014-3-13 John Lazzaro (not a prof - “John” is always OK)

2014-3-13 John Lazzaro (not a prof - “John” is always OK)

111 Views Download Presentation
Download Presentation

2014-3-13 John Lazzaro (not a prof - “John” is always OK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 16 -- Midterm I Review Session 2014-3-13 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love Play:

  2. Today - Midterm I Review Session Study tips, test ground rules All questions answered (almost ...) Short break HW 1, problem by problem ... Recall: HW 1 was Fall 05 Mid-term I.

  3. On Tuesday Mid-term I ... Ground rules ...

  4. When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday March 18th, 306 Soda. Every-other-seat seating, except for the front row, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils down @ 10:55 AM, so we can collect papers before next class comes in.

  5. When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc ... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you’ll still need to hand in your exam @ 10:55. Questions are reserved for serious concerns about a bug in the question.

  6. What does it cover? First 8 lectures. Not to be taken legalistically. For example, WAW hazards were covered in the pipelining lecture, and so it’s fair for me to show a pipeline and say “does this have a WAW hazard”, even if that example was also seen in a later lecture.

  7. Mid-term: How to do well ... Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you’re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be “you can onlygetit if do the reading” problems ... but the reading helps you understand how to think through the problem.

  8. Mid-term: There may be math ... No memorization: If we ask about Amdahl’s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. Cannot use electronic devices ... more administrative info after we do some content. You may need to do: simple algebra and calculus, add a few numbers by hand, etc.

  9. Example starting slides ... These are meant to be examples, not a complete list! A problem may start with a slide that is not in this part of the presentation!

  10. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 2 – Single Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  11. “two read ports” Merging data paths ... Add muxes R-format N N N 2 How many ? I-format (ignore ALU control) Where ?

  12. RegFile 5 rs1 32 5 rd1 rs2 5 ws 32 rd2 wd WE Equal (wire into control) 32 MemToReg Adding branch testing to the data path ALUctr Ext RegDest ALUsrc ExtOp RegWr MemWr Syntax: BEQ $1, $2, 12 Action: If ($1 != $2), PC = PC + 4 Action: If ($1 == $2), PC = PC + 4 + 48

  13. Josh Fisher: idea grew out of his Ph.D (1979) in compilers VLIW Led to a startup (MultiFlow) whose computers worked, but which went out of business ... the ideas remain influential. Very Long Instruction Words

  14. 32-bit & 64-bit semantics different? Yes! Assume: $7 = 7, $8 = 8, $9 = 9, $10 = 10 (decimal) 32-bit MIPS: ADD $8 $9 $10; Result: $8 = 19 ADD $7 $8 $9; Result: $7 = 28 VLIW: Instr: ADD $8 $9 $10 ; result $8 = 19 ADD $7 $8 $9 ; result $7 = 17 (not 28)

  15. opcode opcode Solution: New “predication” operator. rs rs rt rt rd rd shamt shamt funct funct Syntax: SELECT $7 $8 $9 $10 Semantics: If $8 == 0, $7 = $10, else $7 = $9 Permits simple branches to be converted to inline code. Branch policy: All instr operators execute BNE $8 $9 Label ADD $7 $8 $9 ADD executes if branch is taken or not taken. Problem: Large N machines find it hard to fill all operators with useful work.

  16. Branch nesting in a single instruction ... BEQ $8 $9 LabelOne opcode rs rt rd shamt funct opcode rs rt rd shamt funct BEQ $11 $12 LabelTwo Conundrum: How to define the semantics of multiple branches in one instruction? Solution: Nested branch semantics If $8 == $9, branch to LabelOne Else $11 == $12, branch to LabelTwo

  17. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 3 – Metrics 2014-1-28 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  18. CPU time: Proportional to Instruction Count Q. Once ISA is set, who can influence instruction count? Q. Static count? (lines of program printout) Or dynamic count? (trace of execution) A. Compiler writer, application developer. A. Dynamic. CPU time Machine Instructions ∝ Program Program Q. How does a architect influence the number of machine instructions needed to run an algorithm? Rationale: Every additional instruction you execute takes time. A. Create new instructions: instruction set architect.

  19. opcode opcode N x 32-bit VLIW yields factor of N speedup! rs rs rt rt rd rd shamt shamt funct funct Multiflow: N = 7, 14, or 28 (3 CPUs in product family) Recall Lecture 2: Multi-flow VLIW CPU Q. Which right-hand-side term decreases with “N” ? Seconds Instructions Cycles Seconds = Program Program Cycle Instruction A. We hope this one doesn’t grow. A. This one gets smaller. Semantics:$8 = $9 + $10 Syntax: ADD $8 $9 $10 Semantics:$7 = $8 + $9 Syntax: ADD $7 $8 $9

  20. Linear model works for reasonable fan-out Delay time of an inverter driving 4 inverters. FO4: Fanout of four delay. Driving more gates adds delay. A closer look at fan-out ...

  21. CLKd Clock skew also eats into “time budget” CLKd CLKd As T →0, which circuit fails first?

  22. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 4 – Pipelining 2014-1-30 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  23. Hazards: An instruction is not a car ... Stage #3 Stage #1 Stage #2 Instr Fetch Decode & Reg Fetch ADD R4,R3,R2 OR R5,R4,R2 IR IR IR ... wrong value of R4 fetched from RegFile, contract with programmer broken! Oops! R4 not written yet ... New sample program ADD R4,R3,R2 A OR R5,R4,R2 An example of a “hazard” -- we must (1) detect and (2) resolve all hazards to make a CPU that matches ISA M B

  24. Performance Equation and Pipelining Seconds Instructions Cycles Seconds = Program Program Cycle Instruction Instr Fetch Decode & Reg Fetch Stage #3 IR IR IR CPI == 1 Once pipe is fill, one instruction completes per cycle Clock period is shorter Less work to do in each cycle A To get shortest clock period, balance the work to do in each pipeline stage. M B

  25. Data Hazards: 3 Types (RAW, WAR, WAW) Write After Read (WAR) hazards. Instruction I2 expects to write over a data value after an earlier instruction I1 reads it. But instead, I2 writes tooearly, and I1 sees the new value. Write After Write (WAW) hazards. Instruction I2 writes over data an earlier instruction I1 also writes. But instead, I1 writes after I2, and the final data value is incorrect. WAR and WAW not possible in our 5-stage pipeline. But are possible in other pipeline designs.

  26. Resolving a RAW hazard by forwarding Just forward it back! ALU computes R4 in the EX stage, so ... 1 2 3 “IF” Stage “ID/RF” Stage “EX” Stage Execution Instr Fetch Decode & Reg Fetch Sample program OR R5,R4,R2 ADD R4,R3,R2 IR IR IR ADD R4,R3,R2 OR R5,R4,R2 A Y M M Unlike stalling, does not change CPI. May hurt cycle time. B

  27. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 5 – ISA Design + Microcode + Cost 2014-2-4 John Lazzaro (not a prof - “John” is always OK) Born on this day in 2004. Facebook TA: Eric Love

  28. Register-register 1990s technology was ready for RISC So, larger code size would not monopolize bandwidth to off-chip DRAM memory. Fixed-length instructions made fast pipelining practical. For the right target ISA, compiled code quality could match hand-coded assembly. Not really quantitative ... -----------> Transistors were available for on-chip instruction cache. Machine code for c = a + b;

  29. Instruction modes: “C is a high-level assembler” origin w += i w += 3 w += a[100 + i] w += a[i] w += a[i + j] w += a[1001] w += a[*p] a[i++] a[i--] w += a[100 + i + d*j]

  30. How compiler technology can inform an ISA decision There are good heuristic solutions, but they require 16 free registers (preferably more) to work well. During code generation, a compiler allocatesregistersto temps when available, because registers are faster than memory. This line of reasoning quantifies one advantage of 32 general purpose registers Example: f = b + c + d - 1 becomes: Temporaries: a, e In the general case, register allocation task is NP-complete ...

  31. An example of a complex instruction 8 byte instruction fetch amortized by 28 byte data move MOVEM.L D0/D4-D7/A4/A5,40(A6) Move the 32-bit data stored in 7 registers (D0, D4, D5, D6, D7, A4, A5) to the region of memory pointed to by A^, displaced by 28H bytes. Takes 58 clock cycles to execute. Requires non-architected state to keep track of memory and register indices.

  32. Assembler format OE1, WEP; a list of “1” columns. But ... what exactly is microcode? Each clock cycle, we can think of this data path as executing an 11-bit microcode instruction word: One microcode instruction - binary format P R1 R4 D Q D Q D Q ... 32 32 32 32 32 32 OE OE OE WE WE WE DN WEP OEP WE1 OE1 WE4 OE4 32

  33. 353 Haswell CPU dies on this wafer ==> $4.80 per die! But if die were twice as big ... $9.60 per die! This is one reason why die size matters. 11 11 21 21 27 27 31 31 Die counts per column 33 33 35 35 This analysis is optimistic ... 37

  34. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 6 – Superpipelining + Branch Prediction 2014-2-6 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  35. Pipelining a 256 byte instruction memory. Fully combinational (and slow). Only read behavior shown. Can we add two pipeline stages? A7-A0: 8-bit read address A4 A3 A2 A7 A6 A5 { { 3 3 OE --> Tri-state Q outputs! 256 OE Q Byte 0-31 Data output is 32 bits DEMUX 3 256 1 256 OE MUX Byte 32-63 Q ... D0-D31 ... 32 i.e. 4 bytes ... 256 OE Byte 224-255 Q Each register holds 32 bytes (256 bits)

  36. Spatial Predictors Idea: Devote hardware to four 2-bit predictors for BEQZ branch. C code snippet: P1: Use if b1 and b2 nottaken. b1 P2:Use if b1 taken, b2 not taken. P3:Use if b1 nottaken, b2 taken. P4: Use if b1 and b2 taken. b2 Track the current taken/not-taken status of b1 and b2, and use it to choose from P1 ... P4 for BEQZ ... b3 How? After compilation: b1 b1 b2 b2 We want to predict this branch. b3 Can b1 and b2 help us predict it?

  37. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 7 -- Power and Energy 2014-2-11 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  38. The Joule: Unit of energy. A 1 Gallon gas container holds 130 MJ of energy. The Watt: Unit of power. A rate of energy (J/s). A gas pump hose delivers 6 MW. 120 KW: The power delivered by a Tesla Supercharger. Tesla Model S has a 306 MJ battery (good for 265 miles). 1 J = 1 W s. 1 W = 1 J/s.

  39. And so, we can transform this: Gate delay roughly linear with Vdd 2 P ~ F ⨯ Vdd 2 P ~ 1 ⨯ 1 Block processes stereo audio. 1/2 of clocks for “left”, 1/2 for “right”. Top block processes “left”, bottom “right”. Into this: 2 P~ #blks ⨯ F ⨯ Vdd P~ 2 ⨯ 1/2 ⨯ 1/4 = 1/4 CV2 power only This magic trick brought to you by Cory Hall ...

  40. www-inst.eecs.berkeley.edu/~cs152/ CS 152 Computer Architecture and Engineering Lecture 8 -- CPU Verification 2014-2-13 John Lazzaro (not a prof - “John” is always OK) TA: Eric Love

  41. “CPU program” diagnosis is tricky ... Observation: On a buggy CPU model, the correctness of every executed instruction is suspect. Consequence: One needs to verify the correctness of instructions that surround the suspected buggy instruction. Depends on: (1) number of “instructions in flight” in the machine, and (2) lifetime of non-architected state (may be “indefinite”).

  42. 65 Just test them all? Cin 2 = Exhaustive testing does not “scale”. A 32 + Sum 32 “Combinatorial explosion!” B 32 Cout Combinational Unit Testing: 32-bit Adder Number of input bits ? 65 Total number of possible input values? 3.689e+19

  43. On Tuesday Mid-term I ... Ground rules ...

  44. When is it? Where is it? Ground rules. 9:30 AM sharp, Tuesday March 18th, 306 Soda. Every-other-seat seating, except for the front row, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils down @ 10:55 AM, so we can collect papers before next class comes in.

  45. When is it? Where is it? Ground rules. No use of calculators, smartphones, laptops, etc ... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you’ll still need to hand in your exam @ 10:55. Questions are reserved for serious concerns about a bug in the question.

  46. Today - Midterm I Review Session All questions answered (almost ...)

  47. Break Play:

  48. # Points CS152 Midterm I October 4th 2005 Name: SSID: “All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet.” Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Now at Splunk (log files in the cloud) DavidMarquardt UdamSaini JohnLazzaro Now at Redux (10-second online videos) (still @ berkeley)

  49. Q1. Register File Design (10 points) clk ws 5 R0 - The constant 0 Q En Q R1 D DEMUX WE En Q R2 ... D ... En R31 Q D 32 wd

  50. Q1: The actual question ...