55 035 computer architecture and organization
Download
1 / 49

55:035 Computer Architecture and Organization - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

55:035 Computer Architecture and Organization. Lecture 11. Outline. Interrupts Program Flow Multiple Interrupts Nesting IO Architecture Bus Types Transfer Methods Disks Disk Arrays. Interrupts. Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 55:035 Computer Architecture and Organization' - ulla-mays


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
55 035 computer architecture and organization

55:035Computer Architecture and Organization

Lecture 11


Outline
Outline

  • Interrupts

    • Program Flow

    • Multiple Interrupts

    • Nesting

  • IO

    • Architecture

    • Bus Types

    • Transfer Methods

    • Disks

    • Disk Arrays

55:035 Computer Architecture and Organization


Interrupts
Interrupts

  • Mechanism by which other modules (e.g. I/O) may interrupt normal sequence of processing

  • Program

    • e.g. overflow, division by zero

  • Timer

    • Generated by internal processor timer

    • Used in pre-emptive multi-tasking

  • I/O

    • from I/O controller

  • Hardware failure

    • e.g. memory parity error

55:035 Computer Architecture and Organization


Interrupt cycle
Interrupt Cycle

  • Added to instruction cycle

  • Processor checks for interrupt

    • Indicated by an interrupt signal

  • If no interrupt, fetch next instruction

  • If interrupt pending:

    • Suspend execution of current program

    • Save context

    • Set PC to start address of interrupt handler routine

    • Process interrupt

    • Restore context and continue interrupted program






Multiple interrupts
Multiple Interrupts

  • Disable interrupts

    • Processor will ignore further interrupts whilst processing one interrupt

    • Interrupts remain pending and are checked after first interrupt has been processed

    • Interrupts handled in sequence as they occur

  • Define priorities

    • Low priority interrupts can be interrupted by higher priority interrupts

    • When higher priority interrupt has been processed, processor returns to previous interrupt


Multiple interrupts sequential
Multiple Interrupts - Sequential

55:035 Computer Architecture and Organization


Multiple interrupts nested
Multiple Interrupts – Nested

55:035 Computer Architecture and Organization


Time sequence of multiple interrupts
Time Sequence of Multiple Interrupts

55:035 Computer Architecture and Organization


Input output system performance issues
Input/Output & System Performance Issues

  • System Architecture & I/O Connection Structure

    • Types of Buses/Interconnects in the system.

  • I/O Data Transfer Methods.

  • Cache & I/O: The Stale Data Problem

  • I/O Performance Metrics.

  • Magnetic Disk Characteristics.

  • Designing an I/O System & System Performance:

    • Determining system performance bottleneck.

      • (which component creates a system performance bottleneck)

55:035 Computer Architecture and Organization


The von neumann computer model
The Von-Neumann Computer Model

  • Partitioning of the computing engine into components:

    • Central Processing Unit (CPU): Control Unit (instruction decode, sequencing of operations), Datapath (registers, arithmetic and logic unit, buses).

    • Memory: Instruction (program) and operand (data) storage.

    • Input/Output (I/O): Communication between the CPU and the outside world

-

Input

Control

Memory

(instructions,

data)

I/O Subsystem

Datapath

registers

ALU, buses

Output

I/O Devices

CPU

Computer System

System performance depends on many aspects of the system

(“limited by weakest link in the chain”)


Input and output i o subsystem
Input and Output (I/O) Subsystem

  • The I/O subsystem provides the mechanism for communication between the CPU and the outside world (I/O devices).

  • Design factors:

    • I/O device characteristics (input, output, storage, etc.).

    • I/O Connection Structure (degree of separation from memory operations).

    • I/O interface (the utilization of dedicated I/O and bus controllers).

    • Types of buses (processor-memory vs. I/O buses).

    • I/O data transfer or synchronization method (programmed I/O, interrupt-driven, DMA).

55:035 Computer Architecture and Organization


Typical system architecture
Typical System Architecture

System Bus or Front Side Bus (FSB)

Memory Controller

(Chipset North Bridge)

I/O Controller Hub

(Chipset South Bridge)

Isolated I/O

I/O Subsystem


System components

Memory

Controller

NICs

Memory

System Components

Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)

L1

L2

L3

CPU

(possibly

on-chip)

Important issue: Which component creates a system performance bottleneck?

Caches

(FSB)

System Bus

SDRAM

PC100/PC133

100-133MHz

64-128 bits wide

2-way inteleaved

~ 900 MBYTES/SEC )64bit)

Double Date

Rate (DDR) SDRAM

PC3200

200 MHz DDR

64-128 bits wide

4-way interleaved

~3.2 GBYTES/SEC (64bit)

RAMbus DRAM (RDRAM)

400MHZ DDR

16 bits wide (32 banks)

~ 1.6 GBYTES/SEC

Bus Adapter

Main I/O Bus

Example: PCI, 33-66MHz

32-64 bits wide

133-528 MB/s

PCI-X 133MHz 64-bits wide

1066 MB/s

Memory Bus

I/O Controllers

Disks

Displays

Keyboards

Networks

Chipset

I/O Devices

Chipset

I/O Subsystem

North

Bridge

South

Bridge

55:035 Computer Architecture and Organization


I o interface
I/O Interface

I/O Interface, I/O controller or I/O bus adapter:

  • Specific to each type of I/O device.

  • To the CPU, and I/O device, it consists of a set of control and data registers (usually memory-mapped) within the I/O address space.

  • On the I/O device side, it forms a localized I/O bus which can be shared by several I/O devices

    • (e.g IDE, SCSI, USB ...)

  • Handles I/O details (originally done by CPU) such as:

    • Assembling bits into words,

    • Low-level error detection and correction

    • Accepting or providing words in word-sized I/O registers.

    • Presents a uniform interface to the CPU regardless of I/O device.

Processing

off-loaded

from CPU

55:035 Computer Architecture and Organization


I o controller architecture
I/O Controller Architecture

Chipset

North Bridge

Chipset

South Bridge

Peripheral or Main I/O Bus (PCI, PCI-X, etc.)

Peripheral Bus Interface/DMA

Host

Micro-controller

or

Embedded processor

Memory

Buf

fer

Memory

µProc

Processor

Cache

ROM

Host

I/O Channel Interface

Processor

I/O Contr

oller

SCSI, IDE, USB, ….


Types of buses in the system 1 2
Types of Buses in The System (1/2)

  • Processor-Memory Bus

  • System Bus, Front Side Bus, (FSB)

    • Should offer very high-speed (bandwidth) and low latency.

    • Matched to the memory system performance to maximize memory-processor bandwidth.

    • Usually design-specific (not an industry standard).

    • Examples:

      • Alpha EV6 (AMD K7), Peak bandwidth = 400 MHz x 8 = 3.2 GB/s

      • Intel GTL+ (P3), Peak bandwidth = 133 MHz x 8 = 1 GB/s

      • Intel P4, Peak bandwidth = 800 Mhz x 8 = 6.4 GB/s

      • HyperTransport 2.0: 200Mhz-1.4GHz, Peak bandwidth up to 22.8 GB/s (point-to-point system interconnect not a bus)


Types of buses in the system 2 2
Types of Buses in The System (2/2)

  • I/O buses (sometimes called an interface):

    • Follow bus industry standards.

    • Usually formed by I/O interface adapters to handle many types of connected I/O devices.

    • Wide range in the data bandwidth and latency

    • Not usually interfaced directly to memory instead connected processor-memory bus via a bus adapter (chipset south bridge).

    • Examples:

      • Main system I/O bus: PCI, PCI-X, PCI Express

      • Storage: SATA, IDE, SCSI.

55:035 Computer Architecture and Organization


Intel pentium 4 system architecture using the intel 925 chipset
Intel Pentium 4 System Architecture (Using The Intel 925 Chipset)

CPU

(Including cache)

System Bus (Front Side Bus, FSB)

Bandwidth usually should match or exceed

that of main memory

Memory Controller Hub

(Chipset North Bridge)

System

Memory

Two 8-byte DDR2 Channels

Graphics I/O Bus (PCI Express)

Storage I/O (Serial ATA)

Main

I/O Bus

(PCI)

Misc.

I/O

Interfaces

Misc.

I/O

Interfaces

I/O Controller Hub

(Chipset South Bridge)

I/O Subsystem

55:035 Computer Architecture and Organization


Bus characteristics

Option High performance Low cost/performance

Bus width Separate address Multiplex address & data lines & data lines

Data width Wider is faster Narrower is cheaper (e.g., 64 bits) (e.g., 16 bits)

Transfer size Multiple words has Single-word transfer less bus overhead is simpler

Bus masters Multiple Single master (requires arbitration) (no arbitration)

Split Yes, separate No , continuous transaction? Request and Reply connection is cheaper packets gets higher and has lower latency bandwidth (needs multiple masters)

Clocking Synchronous Asynchronous

Bus Characteristics


Storage io interfaces buses

IDE/Ultra ATASCSI

Data Width 16 bits 8 or 16 bits (wide)

Clock Rate Upto 100MHz 10MHz (Fast)

20MHz (Ultra)

40MHz (Ultra2)

80MHz (Ultra3) 160MHz (Ultra4)

Bus Masters 1 Multiple

Max no. devices 2 7 (8-bit bus)

15 (16-bit bus)

Peak Bandwidth 200 MB/s 320MB/s (Ultra4)

Storage IO Interfaces/Buses

55:035 Computer Architecture and Organization


I o data transfer methods 1 2

Time(workload) = Time(CPU) + Time(I/O) - Time(Overlap)

I/O Data Transfer Methods (1/2)

  • Programmed I/O (PIO): Polling (For low-speed I/O)

    • The I/O device puts its status information in a status register.

    • The processor must periodically check the status register.

    • The processor is totally in control and does all the work.

    • Very wasteful of processor time.

    • Used for low-speed I/O devices (mice, keyboards etc.)

55:035 Computer Architecture and Organization


I o data transfer methods 2 2
I/O Data Transfer Methods (2/2)

  • Interrupt-Driven I/O (For medium-speed I/O):

    • An interrupt line from the I/O device to the CPU is used to generate an I/O interrupt indicating that the I/O device needs CPU attention.

    • The interrupting device places its identity in an interrupt vector.

    • Once an I/O interrupt is detected the current instruction is completed and an I/O interrupt handling routine (by OS) is executed to service the device.

    • Used for moderate speed I/O (optical drives, storage, neworks ..)

    • Allows overlap of CPU processing time and I/O processing time

55:035 Computer Architecture and Organization


I o data transfer methods
I/O data transfer methods

  • Direct Memory Access (DMA) (For high-speed I/O):

    • Implemented with a specialized controller that transfers data between an I/O device and memory independent of the processor.

    • The DMA controller becomes the bus master and directs reads and writes between itself and memory.

    • Interrupts are still used only on completion of the transfer or when an error occurs.

    • Low CPU overhead, used in high speed I/O (storage, network interfaces)

    • Allows more overlap of CPU processing time and I/O processing time than interrupt-driven I/O.

55:035 Computer Architecture and Organization


Dma transfer step
DMA transfer step

  • DMA transfer steps:

    • The CPU sets up DMA by supplying device identity, operation, memory address of source and destination of data, the number of bytes to be transferred.

    • The DMA controller starts the operation. When the data is available it transfers the data, including generating memory addresses for data to be transferred.

    • Once the DMA transfer is complete, the controller interrupts the processor, which determines whether the entire operation is complete.

55:035 Computer Architecture and Organization


Cache i o the stale data problem
Cache & I/O: The Stale Data Problem

  • Three copies of data, may exist in: cache, memory, disk.

    • Similar to cache coherency problem in multiprocessor systems.

  • CPU or I/O (DMA) may modify/access one copy while other copies contain stale (old) data.

  • Possible solutions:

    • Connect I/O directly to CPU cache: CPU performance suffers.

    • With write-back cache, the operating system flushes caches into memory (forced write-back) to make sure data is not stale in memory.

    • Use write-through cache; I/O receives updated data from memory (This uses too much memory bandwidth).

    • The operating system designates memory address ranges involved in I/O DMA operations as non-cacheable.


I o connected directly to cache
I/O Connected Directly To Cache

This solution

may slow down CPU

performance

DMA

I/O

A possible solution for

the stale data problem

However:

CPU performance suffers


Factors affecting performance
Factors Affecting Performance

  • I/O processing computational requirements:

    • CPU computations available for I/O operations.

    • Operating system I/O processing policies/routines.

    • I/O Data Transfer/Processing Method: Polling, Interrupt Driven. DMA

  • I/O Subsystem performance:

    • Raw performance of I/O devices (i.e magnetic disk performance).

    • I/O bus capabilities.

    • I/O subsystem organization. i.e number of devices, array level ..

    • Loading level of I/O devices (queuing delay, response time).

  • Memory subsystem performance:

    • Available memory bandwidth for I/O operations (For DMA)

  • Operating System Policies:

    • File system vs. Raw I/O.

    • File cache size and write Policy.


I o performance metrics throughput
I/O Performance Metrics: Throughput:

  • Throughput is a measure of speed—the rate at which the I/O or storage system delivers data.

  • I/O Throughput is measured in two ways:

  • I/O rate, Measured in:

    • Accesses/second,

    • Transactions Per Second (TPS) or,

    • I/O Operations Per Second (IOPS).

  • I/O rate is generally used for applications where the size of each request is small, such as in transaction processing.

  • Data rate, measured in bytes/second or megabytes/second (MB/s).

    • Data rate is generally used for applications where the size of each request is large, such as in scientific and multimedia applications.


  • Magnetic disks

    Seek Time

    Magnetic Disks

    Current Rotation speed

    7200-15000 RPM

    Characteristics:

    Diameter (form factor): 2.5in - 5.25in

    • Rotational speed: 3,600RPM-15,000 RPM

    • Tracks per surface.

    • Sectors per track: Outer tracks contain

      more sectors.

    • Recording or Areal Density: Tracks/in X Bits/in

    • Cost Per Megabyte.

    • Seek Time: (2-12 ms)

      The time needed to move the read/write head arm.

      Reported values: Minimum, Maximum, Average.

    • Rotation Latency or Delay: (2-8 ms)

      The time for the requested sector to be under

      the read/write head. (~ time for half a rotation)

    • Transfer time: The time needed to transfer a sector of bits.

    • Type of controller/interface: SCSI, EIDE

    • Disk Controller delay or time.

    • Average time to access a sector of data =

      average seek time + average rotational delay + transfer time

      + disk controller overhead (ignoring queuing time)

    Access time = average seek time + average rotational delay


    Read access
    Read Access

    • Steps

      • Memory mapped I/O over bus to controller

      • Controller starts access

      • Seek + rotational latency wait

      • Sector is read and buffered (validity check)

      • Controller DMA’s to memory and says ready

    • Access time

      Queue + controller delay +block size/bandwidth

      + seek time + transfer time + check delay

    55:035 Computer Architecture and Organization


    Basic disk performance example
    Basic Disk Performance Example

    • Given the following Disk Parameters:

      • Average seek time is 5 ms

      • Disk spins at 10,000 RPM

      • Transfer rate is 40 MB/sec

    • Controller overhead is 0.1 ms

    • Assume that the disk is idle, so no queuing delay exists

    • What is Average Disk read or write service time for a 500-byte (.5 KB) Sector?

      Avg. seek + avg. rot delay + transfer time + controller overhead

      = 5 ms + 0.5/(10000 RPM/60) + 0.5 KB/40 MB/s + 0.1 ms

      = 5 + 3 + 0.13 + 0.1 = 8.23 ms

    Actual time to process the disk request

    is greater and may include CPU I/O processing Time

    and queuing time

    Tservice (Disk Service Time for this request)

    Time for half a rotation

    Here: 1KBytes = 103 bytes, MByte = 106 bytes, 1 GByte = 109 bytes


    Disk arrays
    Disk Arrays

    Disk Product Families

    Conventional: 4 disk designs

    14”

    3.5”

    5.25”

    10”

    High End

    Low End

    Disk Array: 1 disk design

    3.5”

    55:035 Computer Architecture and Organization


    Array reliability
    Array Reliability

    • Reliability of N disks = Reliability of 1 Disk / N

      • 50,000 Hours / 70 disks = 700 hours

    • Disk system MTBF: Drops from 6 years to 1 month!

    • • Arrays (without redundancy) too unreliable to be useful!

    Hot spares support reconstruction in parallel with

    access: very high media availability can be achieved

    55:035 Computer Architecture and Organization


    Redundant array of disks
    Redundant Array of Disks

    • Files are "striped" across multiple spindles

    • Redundancy yields high data availability

    Disks will fail

    Contents reconstructed from data redundantly stored in the array

    Capacity penalty to store it

    Bandwidth penalty to update

    Mirroring/Shadowing (high capacity cost)

    Horizontal Hamming Codes (overkill)

    Parity & Reed-Solomon Codes

    Failure Prediction (no capacity overhead!)

    VaxSimPlus — Technique is controversial

    Techniques:

    55:035 Computer Architecture and Organization



    Raid 1 disk mirroring
    Raid 1: Disk Mirroring

    recovery

    group

    • Each disk is fully duplicated onto its "shadow"

    Very high availability can be achieved

    • Bandwidth sacrifice on write:

    Logical write = two physical writes

    • Reads may be optimized

    • Most expensive solution: 100% capacity overhead

    Targeted for high I/O rate, high availability environments

    55:035 Computer Architecture and Organization


    Raid 3 parity disk
    Raid 3: Parity Disk

    P

    10010011

    11001101

    10010011

    . . .

    1

    0

    0

    1

    0

    0

    1

    1

    1

    1

    0

    0

    1

    1

    0

    1

    1

    0

    0

    1

    0

    0

    1

    1

    0

    0

    1

    1

    0

    0

    0

    0

    logical record

    Striped physical

    records

    • • Parity computed across recovery group to protect against HD failures

      • 33% capacity cost for parity in this configuration

      • wider arrays reduce capacity costs, decrease expected availability, increase reconstruction time

    • • Arms logically synchronized, spindles rotationally synchronized

    • logically a single high capacity, high transfer rate disk

    Targeted for high bandwidth applications: Scientific, Image Processing


    Raid 5 high i o rate parity
    Raid 5+: High I/O Rate Parity

    Increasing

    Logical

    Disk

    Addresses

    D0

    D1

    D2

    D3

    P

    A logical write

    becomes four

    physical I/Os

    Independent writes

    possible because of

    interleaved parity

    Reed-Solomon

    Codes ("Q") for

    protection during

    reconstruction

    D4

    D5

    D6

    P

    D7

    D8

    D9

    P

    D10

    D11

    D12

    P

    D13

    D14

    D15

    Stripe

    P

    D16

    D17

    D18

    D19

    Stripe

    Unit

    D20

    D21

    D22

    D23

    P

    Targeted for mixed

    applications

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    .

    Disk Columns


    Subsystem organization
    Subsystem Organization

    array

    controller

    host

    single board

    disk

    controller

    host

    adapter

    manages interface

    to host, DMA

    single board

    disk

    controller

    control, buffering,

    parity logic

    single board

    disk

    controller

    physical device

    control

    single board

    disk

    controller

    striping software off-loaded from

    host to array controller

    no applications modifications

    no reduction of host performance

    often piggy-backed

    in small format devices


    System availability
    System Availability

    Array

    Controller

    String

    Controller

    . . .

    String

    Controller

    . . .

    String

    Controller

    . . .

    String

    Controller

    . . .

    String

    Controller

    . . .

    Data Recovery Group: unit of data redundancy

    Redundant Support Components: fans, power supplies, controller, cables

    End to End Data Integrity: internal parity protected data paths


    System level availability
    System-Level Availability

    host

    host

    Fully dual redundant

    I/O Controller

    I/O Controller

    Array Controller

    Array Controller

    . . .

    . . .

    . . .

    Goal: No Single

    Points of

    Failure

    . . .

    . . .

    Recovery

    Group

    .

    .

    .

    with duplicated paths, higher performance can be

    obtained when there are no failures


    Peripheral component interconnect
    Peripheral Component Interconnect

    • 2 Types of Agents on the Bus

      • Initiator (master)

      • Target

    • 3 Address Spaces

      • Memory

      • IO

      • Configuration

    • Transactions done in 2 (or more) phases

      • Address/Command

      • Data/Byte Enable Phase(s)

    • Synchronous Operation (positive edge of clock)

    55:035 Computer Architecture and Organization


    Typical pci topology

    Host

    Main

    PCI bridge

    memory

    PCI bus

    Ethernet

    Disk

    Printer

    interf

    ace

    Typical PCI Topology

    55:035 Computer Architecture and Organization


    Pci signals
    PCI Signals

    Name

    F

    unction

    CLK

    A

    33-MHz

    or

    66-MHz

    clo

    c

    k.

    FRAME#

    Sent by the initiator to indicate the start and duration of a transaction.

    AD

    32

    address/data

    lines,

    whic

    h

    ma

    y

    b

    e

    optionally

    increased

    to

    64.

    4 command/byte enable lines (8 for a 64-bit bus.)

    C/BE#

    IRD

    Y#,

    TRD

    Y#

    Initiator-ready

    and

    T

    arget-ready

    signals.

    DEVSEL#

    A

    resp

    onse

    from

    the

    device

    indicating

    that

    it

    has

    recognized

    its

    address

    and

    is

    ready

    for

    a

    data

    transfer

    transaction.

    IDSEL#

    Initialization

    Device

    Select.

    55:035 Computer Architecture and Organization


    Pci read

    1

    2

    3

    4

    5

    6

    7

    CLK

    Frame#

    Adress

    #1

    #4

    #2

    #3

    AD

    Cmnd

    Byte enable

    C/BE#

    IRD

    Y#

    TRD

    Y#

    DEVSEL#

    PCI Read

    55:035 Computer Architecture and Organization