1 / 25

Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.uta

Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev. What is Computer Architecture?. What is Computer Architecture?. If the Intel Pentium4 has a faster clock speed than the

sari
Download Presentation

Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.uta

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev

  2. What is Computer Architecture?

  3. What is Computer Architecture? • If the Intel Pentium4 has a faster clock speed than the • IBM Power4, does it execute your programs faster?

  4. What is Computer Architecture? • If the Intel Pentium4 has a faster clock speed than the • IBM Power4, does it execute your programs faster? Case 1: Completing instruction Clock tick Case 2: Time

  5. What is Computer Architecture? • To a large extent, computer architecture determines: • the number of instructions used to execute a program • the time each instruction takes to execute • the idle cycles when no work gets done • the number of instructions that can execute in parallel

  6. A Typical Microprocessor Branch Predictor L1 Instr Cache Decode & Rename Issue Logic L2 Cache L1 Data Cache ALU ALU ALU ALU Register File

  7. Architecture Trends in the 90s • Performance was the ultimate metric • Transistors were a limiting factor • As on-chip transistors became available in the 90s, more functionality • and complex circuitry was added to boost performance – most of the • low-hanging fruit has now been picked

  8. Hitting the Wall • We have now hit the following walls: • Single core performance • Memory • Complexity • Power, temperature

  9. Hitting the Power Wall From Shekhar Borkar, MICRO’99 Power is as important a metric today as performance

  10. The Advent of Multi-Core Chips Core Cache bank • In the past, performance magically increased by 50% every year • In the future, this improvement will be only ~20% every year • … unless … the application is multi-threaded!

  11. Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors For publications, see http://www.cs.utah.edu/~rajeev/research.html

  12. Interconnects as a Bottleneck • In the past, on-chip data transmission on wires cost almost nothing • Interconnect speed and power has been improving, but not at the • same rate as transistor speeds • Hence, relative to computation, communication is much more expensive • In the near future, it will take 100 cycles to travel across the chip • 50% of chip power can be attributed to interconnects

  13. Interconnects in Multi-Core Chips CPU 1 CPU 2 L2 cache L2 control L2 control CPU 3 L1 A A A A A A A

  14. Not all Wires are Created Equal B-Wires L-Wires W-Wires PW-Wires Relative latency 1x 0.5x 1.6x 3.2x Relative area 1x 4x 0.5x 0.5x Dynamic power (W/m) 2.65a 1.46a 2.9a 0.87a Static Power (W/m) 1.02 0.57 1.16 0.31

  15. Data Transfers have Varying Needs • Example of a cache coherence transaction: • Read exclusive request for a shared block

  16. Other Interconnect Choices • Optical interconnects: speed of light, cost in converting • between optical and electrical domains • 3D chips: reduces communication distances, low cost • for vertical signal transmission, increase in power density

  17. 3D Layouts Cluster Cache bank Intra-die horizontal wire Inter-die vertical wire Die 1 Die 0 (a) Arch-1 (cache-on-cluster) (b) Arch-2 (cluster on cluster) (c) Arch-3 (staggered)

  18. Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Clustered architectures: relatively low complexity scalable solution easily handles multiple threads

  19. Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Heterogeneous perf/power Cores that execute the OS Cores that verify results

  20. Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Hardware to support transactional memory

  21. Upcoming Architecture Challenges • Improving single core performance • Functionalities in multi-core chips • Simplifying the programmer’s task • Efficient interconnects • Power and temperature-efficient designs • Designs tolerant of errors Faults are caused by high energy particles that deposit enough charge to toggle bits Variations in conditions may cause a circuit to not produce its result in time

  22. Research Methodologies • It’s all about the simulators! • Simplescalar & Wattch & Hotspot: about 10,000 lines of • C code that models the flow of instructions through a • modern processor • Inputs: configuration file that specifies processor • parameters, benchmark program (say, gzip) • Outputs: how long the program runs on the simulated • processor (Simplescalar), how much power is consumed • (Wattch), what is the peak temperature (Hotspot)

  23. Evaluating a New Idea • Lots of reading (it’s better than waiting for divine inspiration) • Identify bottlenecks, identify problems, develop an idea, repeatedly • question that idea • Understand simulator • Engineer a solution, modify simulator code (perhaps, write fewer than • 1000 lines of C code) • Analyze data (things never work the first time), engineer/optimize/debug • your solution • Write papers • Implement in silicon?

  24. To Learn More… • CS/EE 3810: Computer Organization • CS/EE 6810: Computer Architecture • CS/EE 7810: Advanced Computer Architecture • CS/EE 7820: Parallel Computer Architecture • CS 7937 / 7940: Architecture Reading Seminar

  25. Title • Bullet

More Related