slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.uta PowerPoint Presentation
Download Presentation
Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.uta

Loading in 2 Seconds...

play fullscreen
1 / 25

Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.uta - PowerPoint PPT Presentation


  • 198 Views
  • Uploaded on

Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.utah.edu/~rajeev. What is Computer Architecture?. What is Computer Architecture?. If the Intel Pentium4 has a faster clock speed than the

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah http://www.cs.uta' - sari


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Computer Architecture Research Overview

Rajeev Balasubramonian

School of Computing, University of Utah

http://www.cs.utah.edu/~rajeev

slide3

What is Computer Architecture?

  • If the Intel Pentium4 has a faster clock speed than the
  • IBM Power4, does it execute your programs faster?
slide4

What is Computer Architecture?

  • If the Intel Pentium4 has a faster clock speed than the
  • IBM Power4, does it execute your programs faster?

Case 1:

Completing instruction

Clock tick

Case 2:

Time

slide5

What is Computer Architecture?

  • To a large extent, computer architecture determines:
  • the number of instructions used to execute a program
  • the time each instruction takes to execute
  • the idle cycles when no work gets done
  • the number of instructions that can execute in parallel
slide6

A Typical Microprocessor

Branch

Predictor

L1 Instr

Cache

Decode &

Rename

Issue Logic

L2 Cache

L1 Data

Cache

ALU

ALU

ALU

ALU

Register

File

slide7

Architecture Trends in the 90s

  • Performance was the ultimate metric
  • Transistors were a limiting factor
  • As on-chip transistors became available in the 90s, more functionality
  • and complex circuitry was added to boost performance – most of the
  • low-hanging fruit has now been picked
slide8

Hitting the Wall

  • We have now hit the following walls:
  • Single core performance
  • Memory
  • Complexity
  • Power, temperature
slide9

Hitting the Power Wall

From Shekhar Borkar, MICRO’99

Power is as important a metric today as performance

slide10

The Advent of Multi-Core Chips

Core

Cache bank

  • In the past, performance magically increased by 50% every year
  • In the future, this improvement will be only ~20% every year
  • … unless … the application is multi-threaded!
slide11

Upcoming Architecture Challenges

  • Improving single core performance
  • Functionalities in multi-core chips
  • Simplifying the programmer’s task
  • Efficient interconnects
  • Power and temperature-efficient designs
  • Designs tolerant of errors

For publications, see http://www.cs.utah.edu/~rajeev/research.html

slide12

Interconnects as a Bottleneck

  • In the past, on-chip data transmission on wires cost almost nothing
  • Interconnect speed and power has been improving, but not at the
  • same rate as transistor speeds
  • Hence, relative to computation, communication is much more expensive
  • In the near future, it will take 100 cycles to travel across the chip
  • 50% of chip power can be attributed to interconnects
slide13

Interconnects in Multi-Core Chips

CPU 1

CPU 2

L2

cache

L2

control

L2

control

CPU 3

L1

A

A

A

A

A

A

A

slide14

Not all Wires are Created Equal

B-Wires

L-Wires

W-Wires

PW-Wires

Relative latency 1x 0.5x 1.6x 3.2x

Relative area 1x 4x 0.5x 0.5x

Dynamic power (W/m) 2.65a 1.46a 2.9a 0.87a

Static Power (W/m) 1.02 0.57 1.16 0.31

slide15

Data Transfers have Varying Needs

  • Example of a cache coherence transaction:
  • Read exclusive request for a shared block
slide16

Other Interconnect Choices

  • Optical interconnects: speed of light, cost in converting
  • between optical and electrical domains
  • 3D chips: reduces communication distances, low cost
  • for vertical signal transmission, increase in power density
slide17

3D Layouts

Cluster

Cache bank

Intra-die horizontal wire

Inter-die vertical wire

Die 1

Die 0

(a) Arch-1 (cache-on-cluster)

(b) Arch-2 (cluster on cluster)

(c) Arch-3 (staggered)

slide18

Upcoming Architecture Challenges

  • Improving single core performance
  • Functionalities in multi-core chips
  • Simplifying the programmer’s task
  • Efficient interconnects
  • Power and temperature-efficient designs
  • Designs tolerant of errors

Clustered architectures:

relatively low complexity

scalable solution

easily handles multiple threads

slide19

Upcoming Architecture Challenges

  • Improving single core performance
  • Functionalities in multi-core chips
  • Simplifying the programmer’s task
  • Efficient interconnects
  • Power and temperature-efficient designs
  • Designs tolerant of errors

Heterogeneous perf/power

Cores that execute the OS

Cores that verify results

slide20

Upcoming Architecture Challenges

  • Improving single core performance
  • Functionalities in multi-core chips
  • Simplifying the programmer’s task
  • Efficient interconnects
  • Power and temperature-efficient designs
  • Designs tolerant of errors

Hardware to support

transactional memory

slide21

Upcoming Architecture Challenges

  • Improving single core performance
  • Functionalities in multi-core chips
  • Simplifying the programmer’s task
  • Efficient interconnects
  • Power and temperature-efficient designs
  • Designs tolerant of errors

Faults are caused by high

energy particles that deposit

enough charge to toggle bits

Variations in conditions may

cause a circuit to not produce

its result in time

slide22

Research Methodologies

  • It’s all about the simulators!
  • Simplescalar & Wattch & Hotspot: about 10,000 lines of
  • C code that models the flow of instructions through a
  • modern processor
  • Inputs: configuration file that specifies processor
  • parameters, benchmark program (say, gzip)
  • Outputs: how long the program runs on the simulated
  • processor (Simplescalar), how much power is consumed
  • (Wattch), what is the peak temperature (Hotspot)
slide23

Evaluating a New Idea

  • Lots of reading (it’s better than waiting for divine inspiration)
  • Identify bottlenecks, identify problems, develop an idea, repeatedly
  • question that idea
  • Understand simulator
  • Engineer a solution, modify simulator code (perhaps, write fewer than
  • 1000 lines of C code)
  • Analyze data (things never work the first time), engineer/optimize/debug
  • your solution
  • Write papers
  • Implement in silicon?
slide24

To Learn More…

  • CS/EE 3810: Computer Organization
  • CS/EE 6810: Computer Architecture
  • CS/EE 7810: Advanced Computer Architecture
  • CS/EE 7820: Parallel Computer Architecture
  • CS 7937 / 7940: Architecture Reading Seminar
slide25

Title

  • Bullet