1 / 27

Computer Architecture Peripherals

Computer Architecture Peripherals. By Dan Tsafrir, 6/6/2011 Presentation based on slides by Lihu Rappoport. Memory: reminder. 1000. CPU. 100. Performance. Gap grew 50% per year. 10. DRAM. 1. 1980. 1981. 1982. 1983. 1984. 1985. 1986. 1987. 1988. 1989. 1990. 1991. 1992. 1993.

faraji
Download Presentation

Computer Architecture Peripherals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer ArchitecturePeripherals By Dan Tsafrir, 6/6/2011Presentation based on slides by Lihu Rappoport

  2. Memory: reminder

  3. 1000 CPU 100 Performance Gap grew 50% per year 10 DRAM 1 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Time Not so long ago… CPU 60% per yr 2X in 1.5 yrs DRAM 9% per yr 2X in 10 yrs

  4. Not so long ago… • In 1994, in their paper “Hitting the Memory Wall: Implications of the Obvious”, William Wulf & Sally McKee said:“We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed – each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs. The difference between diverging exponentials also grows exponentially; so, although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one.”

  5. More recently (2008)… Fast Conventionalarchitecture lower = slower Performance (seconds) Processor cores Slow The memory wall in the multicore era

  6. Memory Trade-Offs • Large (dense) memories are slow • Fast memories are small, expensive and consume high power • Goal: give the processor a feeling that it has a memory which is large (dense), fast, consumes low power, and cheap • Solution: a Hierarchy of memories Speed: Fastest Slowest Size: Smallest Biggest Cost: Highest Lowest Power: Highest Lowest L3 Cache L1 Cache Memory (DRAM) CPU L2 Cache

  7. Typical levels in mem hierarchy

  8. DRAM & SRAM

  9. DRAM basics • DRAM • Dynamic random-access memory • Random access = access cost the same (well, not really) • CPU thinks of DRAM as 1-dimensional • Simpler • But DRAM is actually arranged as a 2-D grid • Need row & col addresses to access • Given “1-D address”, DRAM interface splits it to row & col • Some time duration must elapse between row & col access(10s of ns)

  10. DRAM basics • Why 2D? Why delayed row & col accesses? • Every address-bit requires a physical pin • DRAMs are large (GBs nowadays)=> need many pins => more expensive • A DRAM array has • Row decoder • Extracts row number from memory address • Column decoder • Extracts column number from memory address • Sense amplifiers • Hold row when (1) written to, (2) read from, (3) is refreshed (see next slide)

  11. DRAM basics • Use one transistor-capacitor pair • Per bit • Capacitors leaks • => Need to be refreshed every few ms • DRAM spends ~1% of time in refreshing • “Opening” a row • = fetching it to sense amplifiers • = refreshing it • Is it worth it to make DRAM a rectangle (rather than square?)

  12. x1 DRAM Column decoder Data in/out buffers Senseamplifiers one bit Memoryarray Rowdecoder …rows… …columns…

  13. DRAM banks • Each DRAM memory array outputs one bit • DRAMs use multiple arrays to output multiple bits at a time • x N indicates DRAM with N memory arrays • Typical today: x16, x32 • Each collection of x N arrays forms a DRAM bank • Can read/write from/to each bank independently

  14. x4 DRAM Column decoder Column decoder Column decoder Column decoder Data in/out buffers Data in/out buffers Data in/out buffers Data in/out buffers Senseamplifiers Senseamplifiers Senseamplifiers Senseamplifiers one bit Memoryarray Memoryarray Memoryarray Memoryarray Rowdecoder Rowdecoder Rowdecoder Rowdecoder …row… …rows… …row… …row… …columns… …columns… …columns… …columns…

  15. Ranks & DIMMs • DIMM • (Dual in-line) memory module (the unit we connect to the MB) • Increase bandwidth by delivering data from multiple banks • Bandwidth by one bank is limited • => Put multiple banks on DIMM • Bus has higher clock frequency than any one DRAM • Bus controls switches between banks to achieve high data rate • Increase capacity by utilizing multiple ranks • Each rank is an independent set of banks that can be accessed for the full data bit‐width, • 64 bits for non-ECC; 72 for ECC (error correction code) • Ranks cannot be accessed simultaneously • As they share the same data path

  16. Ranks & DIMMs 1GB 2Rx8 (= 2ranks x 8 banks)

  17. Modern DRAM organization • A system has multiple DIMMs • Each DIMM has multiple DRAM banks • Arranged in one or more ranks • Each bank has multiple DRAM arrays • Concurrency in banks increases memory bandwidth

  18. Memory controller address/command bus data bus chip select 1 Memorycontroller address/command bus data bus chip select 2

  19. Memory controller • Functionality • Executes processor memory requests • In earlier systems • Separate off-processor chip • In modern systems • Integrated on-chip with the processor • Interconnect with processor • Bus, but can be point-to-point, or through crossbar

  20. Lifetime of a memory access • Processor orders & queues memory requests • Request(s) sent to memory controller • Controller queues & orders requests • For each request in queue, when the time is right • Controller waits until requested DRAM ready • Controller breaks address bits into rank, bank, row, column fields • Controller sends chip-select signal to select rank • Selected bank pre-charged to activate selected row • Activate row within selected DRAM bank • Use “RAS” (row-address strobe signal) • Send (entire) row to sense amplifiers • Select desired column • Use “CAS” (column-address strobe signal) • Send data back

  21. Memory address bus CAS# Column latch RAS# Data Column addr decoder Row address decoder Memory array Row latch Addr Basic DRAM array • Timing (2 phases) • Decode row address + RAS assert • Wait for “RAS to CAS delay” • Decode column address + CAS assert • Transfer DATA

  22. DRAM timing • CAS Latency • Number of clock cycles to access a specific column of data • From moment the memory controller issues a column in the current row until data is read out from memory • RAS to CAS delay • Number of cycles between row and column access • Row pre-charge time • Number of cycles to close the opened-row & to open next-row

  23. prechargedelay access time RAS# RAS/CAS delay CAS# A[0:7] Row j X Row i Col n X CAS latency Data Data n Addressing sequence • Access sequence • Put row address on data bus and assert RAS# • Wait for RAS# to CAS# delay (tRCD) • Put column address on data bus and assert CAS# • DATA transfer • Pre-charge

  24. Improved DRAM Schemes • Paged Mode DRAM • Multiple accesses to different columns from same row (special locality) • Saves time it takes to bring a new row (but might be unfair) • Extended Data Output RAM (EDO RAM) • A data output latch enables to parallelize next column address with current column data RAS# CAS# A[0:7] X Row X Col n X Col n+2 X Col n+1 X D n+2 Data Data n D n+1 RAS# CAS# A[0:7] X Row X Col n X Col n+2 X Col n+1 X Data n+2 Data Data n Data n+1

  25. Improved DRAM Schemes (cont) • Burst DRAM • Generates consecutive column address by itself RAS# CAS# A[0:7] Row X Col n X X Data n+2 Data Data n Data n+1

  26. Synchronous DRAM (SDRAM) • Asynchrony in DRAM • Due to RAS & CAS arriving at any time • Synchronous DRAM • Uses clock to deliver requests at regular intervals • More predictable DRAM timing • => Less skew • => Faster turnaround • SDRAMs support burst-mode access • Initial performance similar to BEDO (=burst +EDO) • Clock scaling enabled higher transfer rates later • => DDR SDRAM => DDR2 => DDR3

  27. DRAM vs. SRAM (Random access = access time the same for all locations)

More Related