computer architecture l.
Skip this Video
Loading SlideShow in 5 Seconds..
Computer Architecture PowerPoint Presentation
Download Presentation
Computer Architecture

Loading in 2 Seconds...

play fullscreen
1 / 41

Computer Architecture - PowerPoint PPT Presentation

  • Uploaded on

Computer Architecture “The architecture of a computer is the interface between the machine and the software” - Andris Padges IBM 360/370 Architect Course Outline Computer Architecture Quarter Winter 2006-7 Instructor Muhammad Jahangir Ikram Office: Room 424

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Computer Architecture

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
computer architecture
Computer Architecture

“The architecture of a computer is the interface between the machine and the software”

- Andris Padges

IBM 360/370 Architect


Course Outline

  • Computer ArchitectureQuarter Winter 2006-7
  • Instructor Muhammad Jahangir Ikram
  • Office: Room 424
  • e-mail:
  • Office Hours: Tuesday and Thursday, 10:00 – 11:30pm
course outline contd
Course Outline (Contd..)


This course focuses on the principles, practices and issues in Computer Architecture, while examining computer design tradeoffs both qualitatively and quantitatively.

The course starts with a quick overview of computer design fundamentals and instruction set principles, the materials which the student has already covered in the pre-requisite of this course.

The following topics are covered in greater detail:

  • Advanced Pipelining
  • Instruction-level parallelism and Compiler Support
  • Memory - hierarchy design
  • SIMD, VLIW, Superscalar Architectures
  • Code Optimization and Compiler Issues
course outline contd4
Course Outline (Contd..)

Text Book

Hennessy, J. L, and Patterson, D. A., Computer Architecture: A Quantitative Approach, 2nd Edition. Morgan Kaufmann, 1996.


Course Outline (Contd..)


There will be two 75 minutes lecturers per week and 50 minutes Lecture/ 100 minutes lab.


There will be four Labs during weeks 2, 3, 4, 5.

course outline contd6
Course Outline (Contd..)


  • Quizzes & assignments17+3%
  • Laboratory 10%(Atten 3 + Lab Task 3 + HW 4)
  • Midterm exam 30%
  • Final exam 40%

Fundamentals of Computer Design 1,2 1.1 – 1.10

  • Measuring and Reporting Performance
  • Quantitative Principles of Computer Design

Instruction Set Principles and Examples 3-5 2.1 – 2.8

  • Classifying Instruction Set Architectures
  • Memory Addressing
  • Operations in the Instruction Set
  • Encoding an Instruction Set
    • LAB 1: MIPS Instruction Format and Instruction Study 6

Pipelining Overview 7-14 A.1 to A10

  • What Is Pipelining?
    • Single Cycle Computer Study 9
  • The Major Hurdle of Pipelining – Pipeline Hazards
  • Data Hazards
    • LAB 2: Study of Pipelining 12
  • Control Hazards and Static Branch Prediction
    • LAB 3: Pipeline Studies and Control Hazards 15
  • Scoreboarding


ILP and Dynamic Exploitation 17-19 3.1 – 3.5

  • Static Branch Prediction
  • Tomasulo’s Dynamic Scheduling
  • Dynamic Branch Prediction
  • Superscalar and VLIW architectures

Advanced Pipelining And ILP (Cont’d.) 20-22 3.6 – 3.10

  • Taking Advantage of More ILP with Multiple Issue
  • P6 Architecture

Advanced Pipelining And ILP (Cont’d.) 23-25 4.1, 4.7

  • Compiler Support for Exploiting ILP
  • Hardware Support for Extracting More Parallelism
  • Putting It All Together: The PowerPC 620, and Itanium

Memory-Hierarchy Design 26-29 5.1 – 5.7

  • The ABCs of Caches
  • Reducing Cache Misses
  • Reducing Cache Miss Penalty
  • Virtual Memory System

Computer I/O 30 6.1 - ?


Emergence of the first microprocessor in

late 1970’s

Roughly 35% growth per year

Important changes in the marketplace:

Virtual elimination of assembly language programming reduced the need for object code compatibility

Creation of standardized, vendor-independent operating systems, such as UINX, LINX lowered the risk of bringing out a new architecture

development of risc
Development of RISC
  • These changes lead to the development of a new set of architectures, called the

RISC (Reduced Instruction Set Computer) architecture

  • RISC uses two performance techniques:
    • Instruction level parallelism (pipelining)
    • Use of Cache
scaling of transistors
Scaling of Transistors
  • Feature Size has reduced to 3 micron in 1985 to 0.09 micron.
  • Reducing Feature-size means quadratic increase in Transistor Count and better Performance.
  • But higher routing Delays and poor performance of Long Wires
  • Also means More Power Consumption (Less load Capacitance)
measuring performance
Measuring performance
  • Definition of time:
    • Response time, elapse time: The latency to complete the task, including disk access, input/output, operating system overhead etc.
    • CPU time:
      • User CPU Time
        • Time spent in the program
      • System CPU Time:
        • Time Spent by operating system.
  • Unix Time Command:
    • 90.7s 12.9s 2:39 (159s) 65% (90.7+12.9)/159

(User, System, Elapsed Time)

what is a benchmark
What is a Benchmark?
  • A benchmark is "a standard of measurement or evaluation" (Webster’s II Dictionary).
  • A computer benchmark is typically a computer program that performs a strictly defined set of operations - a workload - and returns some form of result - a metric - describing how the tested computer performed.
  • Computer benchmark metrics usually measure speed: how fast was the workload completed; or throughput: how many workload units per unit time were completed.
  • Running the same computer benchmark on multiple computers allows a comparison to be made.

Source: Standards Performance Evaluation Corporation

programs to evaluate performance
Programs to Evaluate Performance
  • Real Applications
  • Modified (or scripted) applications
  • Kernels
  • Toy benchmarks
  • Synthetic benchmarks
programs to evaluate performance22
Programs to evaluate performance
  • Real Applications
    • Example: Compliers for C, text-processing software etc.
  • Modified (or scripted) applications
    • CPU oriented bench mark, I/O may be removed to minimize its impact on execution
programs to evaluate performance23
Programs to evaluate performance
  • Kernels
    • To isolate performance of individual features of a machine.
  • Toy benchmarks
    • Produces a result that the user already knows
  • Synthetic benchmarks
    • Try to match the average frequency of operations and operands of a large set of programs
benchmark suites
Benchmark Suites
  • SPEC95, SPEC2000 (11 Integer, 14 FP), SPEC2006 (12 Integer, 17 FP)
    • C Compiler, Router, FEM
    • Desktop (CPU and Graphics Intensive)
  • Server (File Servers, Web Servers, Transaction Processing)
  • Embedded (EEMBC)
    • 34 Kernels
what is spec
What is SPEC

SPEC is the Standard Performance Evaluation Corporation. SPEC is a non-profit organization whose members include computer hardware vendors, software companies, universities, research organizations, systems integrators, publishers and consultants. SPEC's goal is to establish, maintain and endorse a standardized set of relevant benchmarks for computer systems. Although no one set of tests can fully characterize overall system performance, SPEC believes that the user community benefits from objective tests which can serve as a common reference point.

what does a benchmark measure
What does a benchmark measure?
    • the computer processor (CPU),
    • the memory architecture, and
    • the compilers.

SPEC CPU2006 contains two components that focus on two different types of compute intensive performance:

  • The CINT2006 suite measures compute-intensive integer performance, and
  • The CFP2006 suite measures compute-intensive floating point performance

Source: Standards Performance Evaluation Corporation

reference machine source standards performance evaluation corporation
Reference Machine Source: Standards Performance Evaluation Corporation
  • SPEC uses a historical Sun system, the "Ultra Enterprise 2" which was introduced in 1997, as the reference machine. The reference machine uses a 296 MHz UltraSPARC II processor, as did the reference machine for CPU2000. But the reference machines for the two suites are not identical: the CPU2006 reference machine has substantially better caches, and the CPU2000 reference machine could not have held enough memory to run CPU2006.
  • It takes about 12 days to do a rule-conforming run of the base metrics for CINT2006 and CFP2006 on the CPU2006 reference machine. SPEC2000 now takes less a minute on latest High Performance M/Cs
amdahl s law
Amdahl’s Law
  • The performance improvement to be gained from using faster mode of execution is limited by the fraction of the time the faster mode can be used
cpu performance equations


Clock Cycle


CPU Time =



Clock Cycle

CPU performance Equations
  • Frequency of FP operations = 25%
  • Average CPI of FP operations = 4.0
  • Average CPI of other instructions = 1.33
  • Frequency of FPSQR = 2%
  • CPI of FPSQR = 20
  • Assume CPI of FPSQR decreased to 2 OR the CPI of all FP operations to 2.5
  • Compare these two designs using the CPU performance equations
example solution
Example: Solution

CPI for enhanced FPSQR

CPI for enhanced FP operation

another measure mips

Instruction Count



Execution Time  10

Another Measure -- MIPS
example an embedded processor
Example:An Embedded Processor
  • 120 MIPS for single processor.
  • 80 MIPS for Processor –Co-Processor Combination (That is how they are measured for combined)
    • I= Number of Integer Instructions
    • F = Number of Floating Point Instructions (8M)
    • Y = No. of Integer Instructions to Emulate one FP Instruction (50)
    • W = Time for choice 1 (4 seconds)
    • B = Time for Choice 2