Lecture 1 introduction
Sponsored Links
This presentation is the property of its rightful owner.
1 / 21

Lecture 1: Introduction PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 1: Introduction. CprE 58 1 Computer Systems Architecture, Fall 2005 Zhao Zhang. Traditional “Computer Architecture”.

Download Presentation

Lecture 1: Introduction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 1: Introduction

CprE 581 Computer Systems Architecture, Fall 2005

Zhao Zhang


Traditional “Computer Architecture”

The term architecture is used here to describe the attribute of a system as seen by the programmer, i.e., theconceptual structure and functional behavior as distinct from the organization of the data flow and controls, the logic design, and the physical implementation.

  • Gene Amdahl, IBM Journal R&D, April 1964


Contemporary “Computer Architecture”

  • Instruction set architecture

  • Microarchitecture:

    • Pipeline structures

    • Cache memories

  • Implementations

    • Logic design and synthesis


Fundamentals

  • Technology trends

  • Performance evaluation methodologies

  • Instruction Set Architecture


Technology Drives for High-Performance

VLSI technology: faster transistors and larger transistor budget


CPU Performance

For sequential program:

CPU time = #Inst  CPI  Clock cycle time

To improve performance

  • Faster clock time

  • Reduce #inst

  • Reduce CPI or increase IPC


How to use one billion transistors?

  • Bit-level parallelism

    • Move from 32-bit to 64-bit

  • Instruction-level parallelism

    • Deep pipeline

    • Execute multiple instructions per cycle

  • Program locality

    • Large caches, more branch prediction resouces

  • Thread-level parallelism


Instruction-Level Parallelism

Pipeline + Multi-issue

IF

IF

IF

IF

IF

ID

ID

ID

ID

ID

EX

EX

EX

EX

EX

MEM

MEM

MEM

MEM

MEM

WB

WB

WB

WB

WB


for (i=0; i<N; i++)

X[i] = a*X[i];

// let R3=&X[0],R4=&X[N]

// and F0=a

LOOP:LD.D F2, 0(R3)

MUL.D F2, F2, F0

S.D F2, 0(R3)

DADD R3, R3, 8

BNE R3, R4, LOOP

What instructions are parallel?

How to schedule those instructions?

Instruction-level Parallelism


Instruction-Level Parallelism

Find independent instructions through dependence analysis

  • Hardware approaches => Dynamically scheduled superscalar

    • Most commonly used today: Intel Pentium, AMD, Sun UltraSparc, and MIPS families

  • Software approaches => (1) Static scheduled superscalar, or (2) VLIW


Modern Superscalar Processors

Example: Intel Pentium, IBM Power/PowerPC, Sun UltraSparc, SGI MIPS …

  • Multi-issue and Deep pipelining

  • Dynamic scheduling and speculative execution

  • High bandwidth L1 caches and large L2/L3 caches


Modern Superscalar Processor

Challenges: Complexity!!!

  • How

  • Understand how it brings high performance

    • Will see wield designs

    • Will use Verilog, simulation to help understanding

  • Have big pictures


  • Modern Superscalar Processor

    Maintain register data flow

    • Register renaming

    • Instruction scheduling

      Maintain control flow

    • Branch prediction

    • Speculative execution and recovery

      Maintain memory data flow

    • Load and store queues

    • Memory dependence speculation


    Memory System Performance

    Memory Stall CPI

    = Miss per inst × miss penalty

    = % Mem Inst × Miss rate × Miss Penalty

    Assume 20% memory instruction, 2% miss rate, 400-cycle miss penalty. How much is memory stall CPI?


    Memory System Performance

    • A typical memory hierarchy today:

    • Here we focus on L1/L2/L3 caches, virtual memory and main memory

    Proc/Regs

    L1-Cache

    Bigger

    Faster

    L2-Cache

    L3-Cache (optional)

    Memory

    Disk, Tape, etc.


    Cache Design

    Many applications are memory-bound

    • CPU speeds increases fast; memory speed cannot match up

      Cache hierarchy: exploits program locality

    • Basic principles of cache designs

    • Hardware cache optimizations

    • Application cache optimizations

    • Prefetching techniques

      Also talk about virtual memory


    High Performance Storage Systems

    What limits the performance of web servers? Storage!

    • Storage technology trends

    • RAID: Redundant array of inexpensive disks


    Multiprocessor Systems

    Must exploit thread-level parallelism for further performance improvement

    Shared-memory multiprocessors: Cooperating programs see the same memory address

    How to build them?

    • Cache coherence

    • Memory consistency


    Emerging Techniques

    • Low power design

    • Multicore and multithreaded processors

    • Secure processor

    • Reliable design


    Why Study Computer Architecture

    As a hardware designer/researcher – know how to design processor, cache, storage, graphics, interconnect, and so on

    As a system designer – know how to build a computer system using the best components available

    As a software designer – know how to get the best performance from the hardware


    Class Web Site

    www.ece.iastate.edu/~zzhang/cpre585/

    • Syllabus

    • Schedule

    • Homework assignments

    • Readings

      WebCT: Grades, Assignments and Discussions


  • Login