14 332 331 computer architecture and assembly language fall 2003 week 12 buses and i o system n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system PowerPoint Presentation
Download Presentation
14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system

Loading in 2 Seconds...

play fullscreen
1 / 45

14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system. [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]. Head’s Up. This week’s material Buses: Connecting I/O devices Reading assignment – PH 8.4

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 12 Buses and I/O system


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
14 332 331 computer architecture and assembly language fall 2003 week 12 buses and i o system

14:332:331Computer Architecture and Assembly LanguageFall 2003Week 12Buses and I/O system

[Adapted from Dave Patterson’s UCB CS152 slides and

Mary Jane Irwin’s PSU CSE331 slides]

head s up
Head’s Up
  • This week’s material
    • Buses: Connecting I/O devices
      • Reading assignment – PH 8.4
    • Memory hierarchies
      • Reading assignment – PH 7.1 and B.5
  • Reminders
    • Next week’s material
    • Basics of caches
      • Reading assignment – PH 7.2
review major components of a computer
Review: Major Components of a Computer

Processor

Devices

Control

Output

Memory

Datapath

Input

Cache

Main Memory

Secondary Memory

(Disk)

input and output devices
Input and Output Devices
  • I/O devices are incredibly diverse wrt
    • Behavior
    • Partner
    • Data rate
magnetic disk
Magnetic Disk
  • Purpose
    • Long term, nonvolatile storage
    • Lowest level in the memory hierarchy
      • slow, large, inexpensive
  • General structure
    • A rotating platter coated with a magnetic surface
    • Use a moveable read/write head to access the disk
  • Advantages of hard disks over floppy disks
    • Platters are more rigid (metal or glass) so they can be larger
    • Higher density because it can be controlled more precisely
    • Higher data rate because it spins faster
    • Can incorporate more than one platter
organization of a magnetic disk
Organization of a Magnetic Disk
  • Typical numbers (depending on the disk size)
    • 1 to 15 (2 surface) platters per disk with 1” to 8” diameter
    • 1,000 to 5,000 tracks per surface
    • 63 to 256 sectors per track
      • the smallest unit that can be read/written (typically 512 to 1,024 B)
    • Traditionally all tracks have the same number of sectors
      • Newer disks with smart controllers can record more sectors on the outer tracks (constant bit density)

Sector

Platters

Track

magnetic disk characteristic

Track

Sector

Cylinder

Platter

Head

Magnetic Disk Characteristic
  • Cylinder: all the tracks under the heads at a given point on all surfaces
  • Read/write data is a three-stage process:
    • Seek time: position the arm over the proper track (6 to 14 ms avg.)
      • due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number
    • Rotational latency: wait for the desired sectorto rotate under the read/write head (½ of 1/RPM)
    • Transfer time: transfer a block of bits (sector)under the read-write head (2 to 20 MB/sec typical)
    • Controller time: the overhead the disk controller imposes in performing an disk I/O access (typically < 2 ms)
i o system interconnect issues

bus

I/O System Interconnect Issues
  • A bus is a shared communication link (a set of wires used to connect multiple subsystems)
    • Performance
    • Expandability
    • Resilience in the face of failure – fault tolerance

Processor

Receiver

Main

Memory

Keyboard

performance measures
Performance Measures
  • Latency (execution time, response time) is the total time from the start to finish of one instruction or action
    • usually used to measure processor performance
  • Throughput – total amount of work done in a given amount of time
    • aka execution bandwidth
    • the number of operations performed per second
  • Bandwidth – amount of information communicated across an interconnect (e.g., a bus) per unit time
    • the bit width of the operation * rate of the operation
    • usually used to measure I/O performance
i o system expandability
I/O System Expandability
  • Usually have more than one I/O device in the system
    • each I/O device is controlled by an I/O Controller

interrupt signals

Processor

Cache

Memory

Memory - I/O Bus

I/O

Controller

I/O

Controller

I/O

Controller

Main

Memory

Terminal

Disk

Disk

Network

bus characteristics
Bus Characteristics
  • Control lines
    • Signal requests and acknowledgments
    • Indicate what type of information is on the data lines
  • Data lines
    • Data, complex commands, and addresses
  • Bus transaction consists of
    • Sending the address
    • Receiving (or sending) the data

Control Lines

Data Lines

output read bus transaction

Step 1: Processor sends read request and read address to memory

Control

Main Memory

Processor

Data

Step 2: Memory accesses data

Control

Main Memory

Processor

Data

Step 3: Memory transfers data to disk

Control

Main Memory

Processor

Data

Output (Read) Bus Transaction
  • Defined by what they do to memory
    • read = output: transfers data from memory (read) to I/O device (write)
input write bus transaction

Step 1: Processor sends write request and write address to memory

Control

Main Memory

Processor

Data

Step 2: Disk transfers data to memory

Control

Main Memory

Processor

Data

Input (Write) Bus Transaction
  • Defined by what they do to memory
    • write = input: transfers data from I/O device (read) to memory (write)
advantages and disadvantages of buses
Advantages and Disadvantages of Buses
  • Advantages
    • Versatility:
      • New devices can be added easily
      • Peripherals can be moved between computer systems that use the same bus standard
    • Low Cost:
      • A single set of wires is shared in multiple ways
  • Disadvantages
    • It creates a communication bottleneck
      • The bus bandwidth limits the maximum I/O throughput
    • The maximum bus speed is largely limited by
      • The length of the bus
      • The number of devices on the bus
    • It needs to support a range of devices with widely varying latencies and data transfer rates
types of buses
Types of Buses
  • Processor-Memory Bus (proprietary)
    • Short and high speed
    • Matched to the memory system to maximize the memory-processor bandwidth
    • Optimized for cache block transfers
  • I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE)
    • Usually is lengthy and slower
    • Needs to accommodate a wide range of I/O devices
    • Connects to the processor-memory bus or backplane bus
  • Backplane Bus (industry standard, e.g., PCI)
    • The backplane is an interconnection structure within the chassis
    • Used as an intermediary bus connecting I/O busses to the processor-memory bus
a two bus system

Bus

Adaptor

Bus

Adaptor

Bus

Adaptor

I/O

Bus

I/O

Bus

I/O

Bus

A Two Bus System
  • I/O buses tap into the processor-memory bus via Bus Adaptors (that do speed matching between buses)
    • Processor-memory bus: mainly for processor-memory traffic
    • I/O busses: provide expansion slots for I/O devices

Processor-Memory Bus

Processor

Memory

a three bus system

Bus

Adaptor

Bus

Adaptor

I/O Bus

Backplane Bus

I/O Bus

Bus

Adaptor

A Three Bus System
  • A small number of Backplane Buses tap into the Processor-Memory Bus
    • Processor-Memory Bus is used for processor memory traffic
    • I/O buses are connected to the Backplane Bus
  • Advantage: loading on the Processor-Memory Bus is greatly reduced

Processor-Memory Bus

Processor

Memory

i o system example apple mac 7200
I/O System Example (Apple Mac 7200)
  • Typical of midrange to high-end desktop system in 1997

Processor

Processor-Memory Bus

Cache

Memory

Audio I/O

Serial ports

PCI

Interface/

Memory

Controller

Main

Memory

I/O

Controller

I/O

Controller

PCI

CDRom

I/O

Controller

I/O

Controller

SCSI bus

Disk

Graphic

Terminal

Network

Tape

example pentium system organization
Example: Pentium System Organization

Processor-Memory

Bus

Memory controller

(“Northbridge”)

PCI Bus

I/O Busses

http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&

a bus transaction

Control: Master initiates requests

Bus

Master

Bus

Slave

Data can go either way

A Bus Transaction
  • A bus transaction includes three parts:
    • Gaining access to the bus - arbitration
    • Issuing the command (and address) - request
    • Transferring the data - action
  • Gaining access to the bus
    • How is the bus reserved by a devices that wishes to use it?
    • Chaos is avoided by a master-slave arrangement
      • The bus master initiates and controls all bus requests
  • In the simplest system:
    • The processor is the only bus master
    • Major drawback - the processor must be involved in every bus transaction
single master bus transaction

Step 1: Disk wants to use the bus so it generates a bus request to processor

Control

Memory

Processor

Data

Step 2: Processor responds and generates appropriate control signals

Control

Memory

Processor

Data

Step 3: Processor gives slave (disk) permission to use the bus

Control

Memory

Processor

Data

Single Master Bus Transaction
  • All bus requests are controlled by the processor
    • it initiates the bus cycle on behalf of the requesting device
multiple potential bus masters arbitration
Multiple Potential Bus Masters: Arbitration
  • Bus arbitration scheme:
    • A bus master wanting to use the bus asserts the bus request
    • A bus master cannot use the bus until its request is granted
    • A bus master must release the bus after its use
  • Bus arbitration schemes usually try to balance two factors:
    • Bus priority - the highest priority device should be serviced first
    • Fairness - Even the lowest priority device should never be completely locked out from using the bus
  • Bus arbitration schemes can be divided into four broad classes
    • Daisy chain arbitration: all devices share 1 request line
    • Centralized, parallel arbitration: multiple request and grant lines
    • Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus
    • Distributed arbitration by collision detection: Ethernet uses this
centralized parallel arbitration

Grant1

Req

Grant2

Req

Bus

Arbiter

GrantN

Req

Centralized Parallel Arbitration
  • Used in essentially all backplane and high-speed I/O busses

Device 1

Device 2

Device N

Control

Data

synchronous and asynchronous buses
Synchronous and Asynchronous Buses
  • Synchronous Bus
    • Includes a clock in the control lines
    • A fixed protocol for communication that is relative to the clock
    • Advantage: involves very little logic and can run very fast
    • Disadvantages:
      • Every device on the bus must run at the same clock rate
      • To avoid clock skew, they cannot be long if they are fast
  • Asynchronous Bus
    • It is not clocked, so requires handshaking protocol (req, ack)
      • Implemented with additional control lines
    • Advantages:
      • Can accommodate a wide range of devices
      • Can be lengthened without worrying about clock skew or synchronization problems
    • Disadvantage: slow(er)
asynchronous handshaking protocol

ReadReq

1

2

addr

data

Data

3

4

Ack

6

5

7

DataRdy

Asynchronous Handshaking Protocol
  • Output (read) data from memory to an I/O device.
  • Memory sees ReadReq, reads addr from data lines, and raises Ack
  • I/O device sees Ack and releases the ReadReq and data lines
  • Memory sees ReadReq go low and drops Ack
  • When memory has data ready, it places it on data lines and raises DataRdy
  • I/O device sees DataRdy, reads the data from data lines, and raises Ack
  • Memory sees Ack, releases the data lines, and drops DataRdy
  • I/O device sees DataRdy go low and drops Ack

I/O device signals a request by raising ReadReq and putting the addr on the data lines

review major components of a computer1
Review: Major Components of a Computer

Processor

Devices

Control

Input

Memory

Datapath

Output

a typical memory hierarchy
A Typical Memory Hierarchy
  • By taking advantage of the principle of locality:
    • Present the user with as much memory as is available in the cheapest technology.
    • Provide access at the speed offered by the fastest technology.

On-Chip Components

Control

eDRAM

Secondary

Memory

(Disk)

Instr

Cache

Second

Level

Cache

(SRAM)

ITLB

Main

Memory

(DRAM)

Datapath

Data

Cache

RegFile

DTLB

Speed (ns): .1’s 1’s 10’s 100’s 1,000’s

Size (bytes): 100’s K’s 10K’s M’s T’s

Cost: highest lowest

characteristics of the memory hierarchy

Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM

4-8 bytes (word)

8-32 bytes (block)

1 block

1,023+ bytes (disk sector = page)

Characteristics of the Memory Hierarchy

Processor

Increasing distance from the processor in access time

L1$

L2$

Main Memory

Secondary Memory

(Relative) size of the memory at each level

memory hierarchy technologies
Memory Hierarchy Technologies
  • Random Access
    • “Random” is good: access time is the same for all locations
    • DRAM: Dynamic Random Access Memory
      • High density (1 transistor cells), low power, cheap, slow
      • Dynamic: need to be “refreshed” regularly (~ every 8 ms)
    • SRAM: Static Random Access Memory
      • Low density (6 transistor cells), high power, expensive, fast
      • Static: content will last “forever” (until power turned off)
    • Size: DRAM/SRAM ­ 4 to 8
    • Cost/Cycle time: SRAM/DRAM ­ 8 to 16
  • “Non-so-random” Access Technology
    • Access time varies from location to location and from time to time (e.g., Disk, CDROM)
classical sram organization square

bit (data) lines

Each intersection represents a

6-T SRAM cell

word (row) select

Classical SRAM Organization (~Square)

r

o

w

d

e

c

o

d

e

r

RAM Cell

Array

Column Selector &

I/O Circuits

column

address

row

address

One memory row holds a block of data, so the column address selects the requested word from that block

data word

classical dram organization square planes

RAM Cell

Array

Classical DRAM Organization (~Square Planes)

bit (data) lines

The column address

selects the requested

bit from the row in each

plane

. . .

r

o

w

d

e

c

o

d

e

r

Each intersection represents a

1-T DRAM cell

word (row) select

column

address

Column Selector &

I/O Circuits

row

address

. . .

data bit

data bit

data bit

data word

ram memory definitions
RAM Memory Definitions
  • Caches use SRAM for speed
  • Main Memory is DRAM for density
    • Addresses divided into 2 halves (row and column)
      • RASor Row Access Strobe triggering row decoder
      • CAS or Column Access Strobe triggering column selector
  • Performance of Main Memory DRAMs
    • Latency: Time to access one word
      • Access Time: time between request and when word arrives
      • Cycle Time: time between requests
      • Usually cycle time > access time
    • Bandwidth: How much data can be supplied per unit time
      • width of the data channel * the rate at which it can be used
classical dram operation

N cols

RAS

Classical DRAM Operation

Column

Address

  • DRAM Organization:
    • N rows x N column x M-bit
    • Read or Write M-bit at a time
    • Each M-bit access requiresa RAS / CAS cycle

DRAM

Row

Address

N rows

M bits

M-bit Output

Cycle Time

1st M-bit Access

2nd M-bit Access

CAS

Row Address

Col Address

Row Address

Col Address

ways to improve dram performance
Ways to Improve DRAM Performance
  • Memory interleaving
  • Fast Page Mode DRAMs – FPM DRAMs
    • www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.htm
  • Extended Data Out DRAMs – EDO DRAMs
    • www.chips.ibm.com/products/memory/88H2011/88H2011.pdf
  • Synchronous DRAMS – SDRAMS
    • www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.htm
  • Rambus DRAMS
    • www.rambus.com/developer/quickfind_documents.html
    • www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B.htm
  • Double Data Rate DRAMs – DDR DRAMS
    • www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323HA.htm
  • . . .
increasing bandwidth interleaving

Memory

Bank 0

Memory

Bank 1

CPU

Memory

Bank 2

Memory

Bank 3

Access Bank 1

Access Bank 0

Access Bank 2

Access Bank 3

We can Access Bank 0 again

Increasing Bandwidth - Interleaving

Access pattern without Interleaving:

Cycle Time

CPU

Memory

Access Time

D1 available

Start Access for D1

D2 available

Start Access for D2

Access pattern with 4-way Interleaving:

problems with interleaving
Problems with Interleaving
  • How many banks?
    • Ideally, the number of banks  number of clocks we have to wait to access the next word in the bank
    • Only works for sequential accesses (i.e., first word requested in first bank, second word requested in second bank, etc.)
  • Increasing DRAM sizes => fewer chips => harder to have banks
    • Growth bits/chip DRAM : 50%-60%/yr
  • Only can use for very large memory systems (e.g., those encountered in supercomputer systems)
fast page mode dram operation

N x M “SRAM”

M bits

1st M-bit Access

2nd M-bit

3rd M-bit

4th M-bit

RAS

CAS

Row Address

Col Address

Col Address

Col Address

Col Address

Fast Page Mode DRAM Operation

Column

Address

  • Fast Page Mode DRAM
    • N x M “SRAM” to save a row

N cols

DRAM

Row

Address

  • After a row is read into the SRAM “register”
    • Only CAS is needed to access other M-bit blocks on that row
    • RAS remains asserted while CAS is toggled

N rows

M-bit Output

why care about the memory hierarchy

µProc

60%/year

(2X/1.5yr)

DRAM

9%/year

(2X/10yrs)

Why Care About the Memory Hierarchy?

Processor-DRAM Memory Gap

1000

CPU

“Moore’s Law”

Processor-Memory

Performance Gap:(grows 50% / year)

100

Performance

10

DRAM

1

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Time

memory hierarchy goals

Probability

of reference

0

2n - 1

Address Space

Memory Hierarchy: Goals
  • Fact: Large memories are slow, fast memories are small
  • How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)?

by taking advantage of

  • The Principle of Locality: Programs access a relatively small portion of the address space at any instant of time.
memory hierarchy why does it work
Memory Hierarchy: Why Does it Work?
  • Temporal Locality (Locality in Time):

=> Keep most recently accessed data items closer to the processor

  • Spatial Locality (Locality in Space):

=> Move blocks consists of contiguous words to the upper levels

Lower Level

Memory

Upper Level

Memory

To Processor

Blk X

From Processor

Blk Y

memory hierarchy terminology

Lower Level

Memory

Upper Level

Memory

To Processor

Blk X

From Processor

Blk Y

Memory Hierarchy: Terminology
  • Hit: data appears in some block in the upper level (Block X)
    • Hit Rate: the fraction of memory accesses found in the upper level
    • Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss

  • Miss: data needs to be retrieve from a block in the lower level (Block Y)
    • Miss Rate = 1 - (Hit Rate)
    • Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor
    • Hit Time << Miss Penalty
how is the hierarchy managed
How is the Hierarchy Managed?
  • registers <-> memory
    • by compiler (programmer?)
  • cache <-> main memory
    • by the hardware
  • main memory <-> disks
    • by the hardware and operating system (virtual memory)
    • by the programmer (files)
summary
Summary
  • DRAM is slow but cheap and dense
    • Good choice for presenting the user with a BIG memory system
  • SRAM is fast but expensive and not very dense
    • Good choice for providing the user FAST access time
  • Two different types of locality
    • Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon.
    • Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.
  • By taking advantage of the principle of locality:
    • Present the user with as much memory as is available in the cheapest technology.
    • Provide access at the speed offered by the fastest technology.