14 332 331 computer architecture and assembly language spring 2005 week 12 buses and i o system
Download
1 / 42

14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system. [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]. Head’s Up. This week’s material Buses: Connecting I/O devices Reading assignment – PH 8.4

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 12 Buses and I/O system' - holmes-franks


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
14 332 331 computer architecture and assembly language spring 2005 week 12 buses and i o system

14:332:331Computer Architecture and Assembly LanguageSpring 2005Week 12Buses and I/O system

[Adapted from Dave Patterson’s UCB CS152 slides and

Mary Jane Irwin’s PSU CSE331 slides]


Head s up
Head’s Up

  • This week’s material

    • Buses: Connecting I/O devices

      • Reading assignment – PH 8.4

    • Memory hierarchies

      • Reading assignment – PH 7.1 and B.8-9

  • Reminders

    • Next week’s material

    • Basics of caches

      • Reading assignment – PH 7.2


Review major components of a computer
Review: Major Components of a Computer

Processor

Devices

Control

Output

Memory

Datapath

Input

Cache

Main Memory

Secondary Memory

(Disk)


Input and output devices
Input and Output Devices

  • I/O devices are incredibly diverse wrt

    • Behavior

    • Partner

    • Data rate


Magnetic disk
Magnetic Disk

  • Purpose

    • Long term, nonvolatile storage

    • Lowest level in the memory hierarchy

      • slow, large, inexpensive

  • General structure

    • A rotating platter coated with a magnetic surface

    • Use a moveable read/write head to access the disk

  • Advantages of hard disks over floppy disks

    • Platters are more rigid (metal or glass) so they can be larger

    • Higher density because it can be controlled more precisely

    • Higher data rate because it spins faster

    • Can incorporate more than one platter


Organization of a magnetic disk
Organization of a Magnetic Disk

  • Typical numbers (depending on the disk size)

    • 1 to 15 (2 surface) platters per disk with 1” to 8” diameter

    • 1,000 to 5,000 tracks per surface

    • 63 to 256 sectors per track

      • the smallest unit that can be read/written (typically 512 to 1,024 B)

    • Traditionally all tracks have the same number of sectors

      • Newer disks with smart controllers can record more sectors on the outer tracks (constant bit density)

Sector

Platters

Track


Magnetic disk characteristic

Track

Sector

Cylinder

Platter

Head

Magnetic Disk Characteristic

  • Cylinder: all the tracks under the heads at a given point on all surfaces

  • Read/write data is a three-stage process:

    • Seek time: position the arm over the proper track (6 to 14 ms avg.)

      • due to locality of disk references the actual average seek time may be only 25% to 33% of the advertised number

    • Rotational latency: wait for the desired sectorto rotate under the read/write head (½ of 1/RPM)

    • Transfer time: transfer a block of bits (sector)under the read-write head (2 to 20 MB/sec typical)

    • Controller time: the overhead the disk controller imposes in performing an disk I/O access (typically < 2 ms)



I o system interconnect issues

bus

I/O System Interconnect Issues

  • A bus is a shared communication link (a set of wires used to connect multiple subsystems)

    • Performance

    • Expandability

    • Resilience in the face of failure – fault tolerance

Processor

Receiver

Main

Memory

Keyboard


Performance measures
Performance Measures

  • Latency (execution time, response time) is the total time from the start to finish of one instruction or action

    • usually used to measure processor performance

  • Throughput – total amount of work done in a given amount of time

    • aka execution bandwidth

    • the number of operations performed per second

  • Bandwidth – amount of information communicated across an interconnect (e.g., a bus) per unit time

    • the bit width of the operation * rate of the operation

    • usually used to measure I/O performance


I o system expandability
I/O System Expandability

  • Usually have more than one I/O device in the system

    • each I/O device is controlled by an I/O Controller

interrupt signals

Processor

Cache

Memory

Memory - I/O Bus

I/O

Controller

I/O

Controller

I/O

Controller

Main

Memory

Terminal

Disk

Disk

Network


Quiz

  • What is disk seek time, and what is rotational time?


Bus characteristics
Bus Characteristics

  • Control lines

    • Signal requests and acknowledgments

    • Indicate what type of information is on the data lines

  • Data lines

    • Data, complex commands, and addresses

  • Bus transaction consists of

    • Sending the address

    • Receiving (or sending) the data

Control Lines

Data Lines


Output read bus transaction

Step 1: Processor sends read request and read address to memory

Control

Main Memory

Processor

Data

Step 2: Memory accesses data

Control

Main Memory

Processor

Data

Step 3: Memory transfers data to disk

Control

Main Memory

Processor

Data

Output (Read) Bus Transaction

  • Defined by what they do to memory

    • read = output: transfers data from memory (read) to I/O device (write)


Input write bus transaction

Step 1: Processor sends write request and write address to memory

Control

Main Memory

Processor

Data

Step 2: Disk transfers data to memory

Control

Main Memory

Processor

Data

Input (Write) Bus Transaction

  • Defined by what they do to memory

    • write = input: transfers data from I/O device (read) to memory (write)


Advantages and disadvantages of buses
Advantages and Disadvantages of Buses memory

  • Advantages

    • Versatility:

      • New devices can be added easily

      • Peripherals can be moved between computer systems that use the same bus standard

    • Low Cost:

      • A single set of wires is shared in multiple ways

  • Disadvantages

    • It creates a communication bottleneck

      • The bus bandwidth limits the maximum I/O throughput

    • The maximum bus speed is largely limited by

      • The length of the bus

      • The number of devices on the bus

    • It needs to support a range of devices with widely varying latencies and data transfer rates


Types of buses
Types of Buses memory

  • Processor-Memory Bus (proprietary)

    • Short and high speed

    • Matched to the memory system to maximize the memory-processor bandwidth

    • Optimized for cache block transfers

  • I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE)

    • Usually is lengthy and slower

    • Needs to accommodate a wide range of I/O devices

    • Connects to the processor-memory bus or backplane bus

  • Backplane Bus (industry standard, e.g., PCI)

    • The backplane is an interconnection structure within the chassis

    • Used as an intermediary bus connecting I/O busses to the processor-memory bus


A two bus system

Bus memory

Adaptor

Bus

Adaptor

Bus

Adaptor

I/O

Bus

I/O

Bus

I/O

Bus

A Two Bus System

  • I/O buses tap into the processor-memory bus via Bus Adaptors (that do speed matching between buses)

    • Processor-memory bus: mainly for processor-memory traffic

    • I/O busses: provide expansion slots for I/O devices

Processor-Memory Bus

Processor

Memory


A three bus system

Bus memory

Adaptor

Bus

Adaptor

I/O Bus

Backplane Bus

I/O Bus

Bus

Adaptor

A Three Bus System

  • A small number of Backplane Buses tap into the Processor-Memory Bus

    • Processor-Memory Bus is used for processor memory traffic

    • I/O buses are connected to the Backplane Bus

  • Advantage: loading on the Processor-Memory Bus is greatly reduced

Processor-Memory Bus

Processor

Memory


I o system example apple mac 7200
I/O System Example (Apple Mac 7200) memory

  • Typical of midrange to high-end desktop system in 1997

Processor

Processor-Memory Bus

Cache

Memory

Audio I/O

Serial ports

PCI

Interface/

Memory

Controller

Main

Memory

I/O

Controller

I/O

Controller

PCI

CDRom

I/O

Controller

I/O

Controller

SCSI bus

Disk

Graphic

Terminal

Network

Tape


Example pentium system organization
Example: Pentium System Organization memory

Processor-Memory

Bus

Memory controller

(“Northbridge”)

PCI Bus

I/O Busses

http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&


Synchronous and asynchronous buses
Synchronous and Asynchronous Buses memory

  • Synchronous Bus

    • Includes a clock in the control lines

    • A fixed protocol for communication that is relative to the clock

    • Advantage: involves very little logic and can run very fast

    • Disadvantages:

      • Every device on the bus must run at the same clock rate

      • To avoid clock skew, they cannot be long if they are fast

  • Asynchronous Bus

    • It is not clocked, so requires handshaking protocol (req, ack)

      • Implemented with additional control lines

    • Advantages:

      • Can accommodate a wide range of devices

      • Can be lengthened without worrying about clock skew or synchronization problems

    • Disadvantage: slow(er)


Asynchronous handshaking protocol

ReadReq memory

1

2

addr

data

Data

3

4

Ack

6

5

7

DataRdy

Asynchronous Handshaking Protocol

  • Output (read) data from memory to an I/O device.

  • Memory sees ReadReq, reads addr from data lines, and raises Ack

  • I/O device sees Ack and releases the ReadReq and data lines

  • Memory sees ReadReq go low and drops Ack

  • When memory has data ready, it places it on data lines and raises DataRdy

  • I/O device sees DataRdy, reads the data from data lines, and raises Ack

  • Memory sees Ack, releases the data lines, and drops DataRdy

  • I/O device sees DataRdy go low and drops Ack

I/O device signals a request by raising ReadReq and putting the addr on the data lines



Review major components of a computer1
Review: Major Components of a Computer memory

Processor

Devices

Control

Input

Memory

Datapath

Output


A typical memory hierarchy
A Typical Memory Hierarchy memory

  • By taking advantage of the principle of locality:

    • Present the user with as much memory as is available in the cheapest technology.

    • Provide access at the speed offered by the fastest technology.

On-Chip Components

Control

eDRAM

Secondary

Memory

(Disk)

Instr

Cache

Second

Level

Cache

(SRAM)

ITLB

Main

Memory

(DRAM)

Datapath

Data

Cache

RegFile

DTLB

Speed (ns): .1’s 1’s 10’s 100’s 1,000’s

Size (bytes): 100’s K’s 10K’s M’s T’s

Cost: highest lowest


Characteristics of the memory hierarchy

Inclusive memory– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM

4-8 bytes (word)

8-32 bytes (block)

1 block

1,023+ bytes (disk sector = page)

Characteristics of the Memory Hierarchy

Processor

Increasing distance from the processor in access time

L1$

L2$

Main Memory

Secondary Memory

(Relative) size of the memory at each level


Memory hierarchy technologies
Memory Hierarchy Technologies memory

  • Random Access

    • “Random” is good: access time is the same for all locations

    • DRAM: Dynamic Random Access Memory

      • High density (1 transistor cells), low power, cheap, slow

      • Dynamic: need to be “refreshed” regularly (~ every 8 ms)

    • SRAM: Static Random Access Memory

      • Low density (6 transistor cells), high power, expensive, fast

      • Static: content will last “forever” (until power turned off)

    • Size: DRAM/SRAM ­ 4 to 8

    • Cost/Cycle time: SRAM/DRAM ­ 8 to 16

  • “Non-so-random” Access Technology

    • Access time varies from location to location and from time to time (e.g., Disk, CDROM)


Classical sram organization square

bit (data) lines memory

Each intersection represents a

6-T SRAM cell

word (row) select

Classical SRAM Organization (~Square)

r

o

w

d

e

c

o

d

e

r

RAM Cell

Array

Column Selector &

I/O Circuits

column

address

row

address

One memory row holds a block of data, so the column address selects the requested word from that block

data word


Classical dram organization square planes

RAM Cell memory

Array

Classical DRAM Organization (~Square Planes)

bit (data) lines

The column address

selects the requested

bit from the row in each

plane

. . .

r

o

w

d

e

c

o

d

e

r

Each intersection represents a

1-T DRAM cell

word (row) select

column

address

Column Selector &

I/O Circuits

row

address

. . .

data bit

data bit

data bit

data word


Ram memory definitions
RAM Memory Definitions memory

  • Caches use SRAM for speed

  • Main Memory is DRAM for density

    • Addresses divided into 2 halves (row and column)

      • RASor Row Access Strobe triggering row decoder

      • CAS or Column Access Strobe triggering column selector

  • Performance of Main Memory DRAMs

    • Latency: Time to access one word

      • Access Time: time between request and when word arrives

      • Cycle Time: time between requests

      • Usually cycle time > access time

    • Bandwidth: How much data can be supplied per unit time

      • width of the data channel * the rate at which it can be used


Classical dram operation

N cols memory

RAS

Classical DRAM Operation

Column

Address

  • DRAM Organization:

    • N rows x N column x M-bit

    • Read or Write M-bit at a time

    • Each M-bit access requiresa RAS / CAS cycle

DRAM

Row

Address

N rows

M bits

M-bit Output

Cycle Time

1st M-bit Access

2nd M-bit Access

CAS

Row Address

Col Address

Row Address

Col Address


Ways to improve dram performance
Ways to Improve DRAM Performance memory

  • Memory interleaving

  • Fast Page Mode DRAMs – FPM DRAMs

    • www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.htm

  • Extended Data Out DRAMs – EDO DRAMs

    • www.chips.ibm.com/products/memory/88H2011/88H2011.pdf

  • Synchronous DRAMS – SDRAMS

    • www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.htm

  • Rambus DRAMS

    • www.rambus.com/developer/quickfind_documents.html

    • www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B.htm

  • Double Data Rate DRAMs – DDR DRAMS

    • www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323HA.htm

  • . . .


Increasing bandwidth interleaving

Memory memory

Bank 0

Memory

Bank 1

CPU

Memory

Bank 2

Memory

Bank 3

Access Bank 1

Access Bank 0

Access Bank 2

Access Bank 3

We can Access Bank 0 again

Increasing Bandwidth - Interleaving

Access pattern without Interleaving:

Cycle Time

CPU

Memory

Access Time

D1 available

Start Access for D1

D2 available

Start Access for D2

Access pattern with 4-way Interleaving:


Problems with interleaving
Problems with Interleaving memory

  • How many banks?

    • Ideally, the number of banks  number of clocks we have to wait to access the next word in the bank

    • Only works for sequential accesses (i.e., first word requested in first bank, second word requested in second bank, etc.)

  • Increasing DRAM sizes => fewer chips => harder to have banks

    • Growth bits/chip DRAM : 50%-60%/yr

  • Only can use for very large memory systems (e.g., those encountered in supercomputer systems)


Fast page mode dram operation

N x M “SRAM” memory

M bits

1st M-bit Access

2nd M-bit

3rd M-bit

4th M-bit

RAS

CAS

Row Address

Col Address

Col Address

Col Address

Col Address

Fast Page Mode DRAM Operation

Column

Address

  • Fast Page Mode DRAM

    • N x M “SRAM” to save a row

N cols

DRAM

Row

Address

  • After a row is read into the SRAM “register”

    • Only CAS is needed to access other M-bit blocks on that row

    • RAS remains asserted while CAS is toggled

N rows

M-bit Output


Why care about the memory hierarchy

µProc memory

60%/year

(2X/1.5yr)

DRAM

9%/year

(2X/10yrs)

Why Care About the Memory Hierarchy?

Processor-DRAM Memory Gap

1000

CPU

“Moore’s Law”

Processor-Memory

Performance Gap:(grows 50% / year)

100

Performance

10

DRAM

1

1980

1981

1982

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Time


Memory hierarchy goals

Probability memory

of reference

0

2n - 1

Address Space

Memory Hierarchy: Goals

  • Fact: Large memories are slow, fast memories are small

  • How do we create a memory that gives the illusion of being large, cheap and fast (most of the time)?

    by taking advantage of

  • The Principle of Locality: Programs access a relatively small portion of the address space at any instant of time.


Memory hierarchy why does it work
Memory Hierarchy: Why Does it Work? memory

  • Temporal Locality (Locality in Time):

    => Keep most recently accessed data items closer to the processor

  • Spatial Locality (Locality in Space):

    => Move blocks consists of contiguous words to the upper levels

Lower Level

Memory

Upper Level

Memory

To Processor

Blk X

From Processor

Blk Y


Memory hierarchy terminology

Lower Level memory

Memory

Upper Level

Memory

To Processor

Blk X

From Processor

Blk Y

Memory Hierarchy: Terminology

  • Hit: data appears in some block in the upper level (Block X)

    • Hit Rate: the fraction of memory accesses found in the upper level

    • Hit Time: Time to access the upper level which consists of

      RAM access time + Time to determine hit/miss

  • Miss: data needs to be retrieve from a block in the lower level (Block Y)

    • Miss Rate = 1 - (Hit Rate)

    • Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor

    • Hit Time << Miss Penalty


How is the hierarchy managed
How is the Hierarchy Managed? memory

  • registers <-> memory

    • by compiler (programmer?)

  • cache <-> main memory

    • by the hardware

  • main memory <-> disks

    • by the hardware and operating system (virtual memory)

    • by the programmer (files)


Summary
Summary memory

  • DRAM is slow but cheap and dense

    • Good choice for presenting the user with a BIG memory system

  • SRAM is fast but expensive and not very dense

    • Good choice for providing the user FAST access time

  • Two different types of locality

    • Temporal Locality (Locality in Time): If an item is referenced, it will tend to be referenced again soon.

    • Spatial Locality (Locality in Space): If an item is referenced, items whose addresses are close by tend to be referenced soon.

  • By taking advantage of the principle of locality:

    • Present the user with as much memory as is available in the cheapest technology.

    • Provide access at the speed offered by the fastest technology.


ad