1 / 43

Relaxing Constraints: Thoughts on the Evolution of Computer Architecture

Relaxing Constraints: Thoughts on the Evolution of Computer Architecture. Joel Emer Alpha Development Group Compaq Computer Corporation. Moore’s Law Alpha-style. Iron Law of Performance. Performance = Frequency * Instructions CPI

isadora
Download Presentation

Relaxing Constraints: Thoughts on the Evolution of Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relaxing Constraints: Thoughts on the Evolution of Computer Architecture Joel Emer Alpha Development Group Compaq Computer Corporation

  2. Moore’s Law Alpha-style

  3. Iron Law of Performance Performance = Frequency * Instructions CPI • Frequency – largely circuit design/technology • CPI – largely organization • Instructions – largely architecture/compiler

  4. Outline • Review of technology factors • Retrospective on the quantitative method • Augmenting the quantitative method • Recommendation

  5. Power Dissipation Trends • Power consumption is increasing • Supply current is increasing faster!

  6. Coping With Power Growth • Technology techniques • Better cooling technology needed • Accelerate Vdd scaling • SOI • Clock distribution • Architectural possibilities • Use less power-hungry structures • Reduce useless speculation

  7. Clock Distribution Trends 21264 Power (Peak) • Frequencies will continue to scale • Clock edge rates are not scaling

  8. Coping With Clock Distribution • Technology solution • Low swing differential clocks • Adiabatic clocking • Architectural possibilities • Multiple clock zones • Asynchronous design

  9. Communication Delay Microprocessor Chip 21064 ~ 1cycle 21164 ~ 1.5 cycles 21264 ~ 3 cycles 21464 ~ 6 cycles Not drawn to scale

  10. Coping With Communication Delay • Technology solutions • Low K dielectrics • Thinner (Cu) interconnect • Architectural possibilities • Deeper pipelining • Replication/clustering of structures • More autonomous computation

  11. SIA Roadmap

  12. Outline • Review of technology factors • Retrospective on the quantitative method • Augmenting the quantitative method • Recommendation

  13. Disclaimer The names used and events depicted in this talk are meant to be real. The events are, however, not an exhaustive enumeration of significant milestones. The misrepresentations of fact and omission of contributors are unintentional and solely the responsibility of the presenter. Finally, the interpretations are just that and are mine as well.

  14. Early quantitative method - 1981

  15. uPC Histogram Chart – 1981-5

  16. Paper counts

  17. Scientific Method • Make hypothesis about behavior • Design experiment • Run experiment and quantify • Interpret results • New hypothesis

  18. Scientific Method • Make hypothesis about behavior • Pick baseline design and workload • Run experiment and quantify • Interpret results • New hypothesis

  19. Scientific Method • Make hypothesis about behavior • Pick baseline design and workload • Run simulation model or measure hardware • Interpret results • New hypothesis

  20. Scientific Method • Make hypothesis about behavior • Pick baseline design and workload • Run simulation model or measure hardware • Interpret results • Propose new design

  21. Making and Testing Hypothesis • Cache experiment (Schlansker) • 64K word cache • 32-way set associative cache/LRU replacement • 200x200 matrix subblock of an N x N matrix • Read twice • Sizes • N=2727: 0 misses • N=2729: 24160 misses • N=2731: 36382 misses

  22. Propose new design • Skewed associative (Seznec) Direct mapped 4-way associative 4-way skewed

  23. Quantitative Approach Problems • Too much abstraction • Intra-chip latencies • Memory subsystem • Poor workloads • Too incremental…

  24. Quantitative -> Incremental

  25. Outline • Review of technology factors • Retrospective on the quantitative method • Augmenting the quantitative method • Recommendation

  26. Relaxing Constraints • Select a constraint to relax • Generate design • Employ quantitative method • Evaluate results

  27. Important Steps… • Before • Carefully pick a constraint to relax • After • Find contributions without constraint • Preserving results after reinstating the constraint

  28. Extrapolate From Current Trends • Personal Workstation – Xerox PARC – late 70’s • Results • Accelerate innovation

  29. Throw Out Standards • Distributed file system - 1985

  30. Decode/Map Queue Fetch Reg Read Execute Dcache/Store Buffer Reg Write Retire Regs Icache Use a Simpler Starting Point • RISC out-of-order (Johnson, Tourng) PC RegisterMap Regs Dcache

  31. Icache CISC-based O-O-O • K6 (Johnson) • Pentium Pro (Colwell, Papworth…) PC Covert CISC to RISC RISCO-O-OCore

  32. Abandon conventions • VLIW (Fisher) • Relieve hardware of all dependency responsibility • Give that responsibility to compiler • Expected consequences • Much simpler implementation • Faster cycle time

  33. Sometimes not what you expect • Compiler scheduling for hardware is a great idea • For 21064 - narrow in-order • For 21164 - wider in-order • For 21264 – wider out-of-order

  34. Issue Logic Critical Loop Issue Conflict Checker to floating point multiply pipeline to floating point add pipeline X to integer pipeline 0 to integer pipeline 1 Instruction Slot Instruction Issue S3 S2

  35. Make a Radical Departure • Multiscalar research (Sohi, Smith…)

  36. New Mechanism Required • Dependence prediction (Moshovos) Store Load Execution Order Load Program Order Load Store Store Trap! Load Load Load

  37. What Was Really Important • Full hardware management (Sohi) • Sequencing • Register dependencies • Memory dependencies • Refinement (Mowry and Olukuton) • Compiler managed – registers, sequencing • Hardware managed memory dependence only

  38. Fetch Issue Reg Read Execute Dcache/Store Buffer Reg Write PC Icache Regs Regs Ignoring Implementation Realities • SMT - in-order (Tullsen, Eggers, Levy) Dcache Icache

  39. Decode/Map Queue Fetch Reg Read Execute Dcache/Store Buffer Reg Write Retire PC RegisterMap Regs Regs Solution Already Available • SMT out-of-order Dcache Icache

  40. Outline • Review of technology factors • Retrospective on the quantitative method • Augmenting the quantitative method • Recommendation

  41. Pay Attention to Reality • Look at technology trends • Power • Latency • Use more realistic models • More organizational details • Better workloads

  42. Ignore Reality • Look for revolutionary contributions • Decide on a constraint to relax • Apply the scientific method • Revolutionary contributions may arise because • Constraint will be relaxed in time • Constraint wasn’t fundamental • New avenues of exploration will be opened

  43. Acknowledgments • Bill Bowhill • Paul Gronowski • Bill Herrick • Toni Juan • Geoff Lowney • Ellen Piccioli • Andre Seznec

More Related