1 / 19

Test 1 Postmortem

Test 1 Postmortem. CSCE 513 Computer Architecture. Readings: Chapter 1 Appendix C Appendix B Chapter 2. September 30, 2013. Test 1 Fall 2012 -. Short Answer Performance (Amdahl’s) Classic 5 Stage pipeline AMAT Forwarding Unroll loop  TakeHome Tomasulo’s. What might be covered!

ursala
Download Presentation

Test 1 Postmortem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Test 1 Postmortem CSCE 513 Computer Architecture • Readings: • Chapter 1 • Appendix C • Appendix B • Chapter 2 September 30, 2013

  2. Test 1 Fall 2012 - • Short Answer • Performance (Amdahl’s) • Classic 5 Stage pipeline • AMAT • Forwarding • Unroll loop  TakeHome • Tomasulo’s • What might be covered! • IEEE 754 • Branch handling • Moving adder to ID stage • Energy/Power • Power Wall • RAW, WAR, WAW • …

  3. 1. a) What does CPI mean? and what is the best possible CPI for a simple scalar processor? • (b) In making cache blocks larger how does that effect cache performance, be particular about what improves and what might be negatively affected? • (c) In making small simple Ll caches how does that affect cache performance?

  4. (d) What is meant by write merging? • (e) What is the misquote of Moore's law that held during the 1980's and 90's?  • (f) How does a tournament branch predictor work • (g) Explain critical word first and early restart. • (h) What is stored in an entry of the TLB?

  5. 2. Performance (a) Suppose the percentage of time an enhancement can be applied is 25% and suppose that the enhancement improves the performance by a factor of 2. What is the overall Speedup? • (b) Suppose there are two improvements A, and B that are applicable applicable 10% and 40% and with speedups 10, and 4 respectively. Assume that A and B do not overlap, What is the ExecTimenewexpressed in terms of ExecTimeorig?

  6. (c) What percentage of the new-improved time is none of the improvements in use? • (d) Assuming only one of A and B can be done which is better?

  7. 3. Classical5 stage pipeline: Assuming the classical5-stage pipeline with no forwarding at all not even through the registers. • Assume all of these instructions execute in 1 cycle. Given the code below: • loop: • DADDIU R2, R2, +8 • LD.D F4, 0(R2) MULT F8, F4, F4 • ADD.D FS, FS, FB • SUB Rl, R3, R2 • BNEZ Rl, loop • (a) Show how the first iteration of the loop would proceed through the pipeline. Stop with the fetch of the first instruc­tion of the loop on the second iteration or when you fill the table. Assume you predict branch taken.

  8. (b) If the loop executes 1000 times how many cycles does it take? • (c) If you predict branch not taken how many cycles does 1000 iterations take?

  9. (d) If you do full forwarding and predict branch taken how many cycles does 1000 iterations take. Figure out how many stalls you elimnate in one iteration and then proceed.

  10. 4. (Average memory access time) • (a) Assume • • the HitTime to the Ll-cache is I ns, • • the MissR.ate to the L1 is 10%, • • the HitTime to the L2 cache is IOns, • • the MissRate to the L2 is 25%, • • the MissPenalty for the L2 cache is 1OOns. • Then what is the average memory access time (AMAT)?

  11. (b) Given the table below which is only a portion of cache satisfying • • memory is byte addressable. • • memory addresses are to I-byte words • • 4-way associativity • • Block size 2B (ridiculously small) • • total cache size 1024 = 1KB • • physical addresses 14 bits wide

  12. i. How Many lines are there? • ii. How many sets? • iii. How big are the block offset, set index and tag fields? • iv. Is 0x3A0F a hit or miss? • v. if it is a hit what data is returned?

  13. 5. Explain how the forwarding shown in the diagram would be • (a) Detected that it should be done • (b) Give an example of a code that would make this type of forwarding occur

  14. Assuming latencies Integer operations, branches require 1 cycle for execution loop: LD FO, 0(R1) MUL.D F4, FOI FO ADD.D FB, FB, F4 S.D FB, O(Rl) ADD.D F4, FO, FO ADD.D FO, F4, FO S.D FO, 1024(R1) DADDIU R1, R1, +8 BNE R1, R2, Loop • (b) Unroll this loop once and schedule the code to eliminate as many stalls as possible? • (c) What do you need to do (or more accurately what does the compiler need to do) to allow your unrolled loop to work if the original loop executes an odd number of times?

  15. Tomasulo’sAlg

More Related