400 likes | 698 Views
E N D
1. 1
2. 2 COMP 206:Computer Architecture and Implementation Montek Singh
Thu, April 16, 2009
Topic: Main Memory (DRAM) Organization
3. 3 Outline Introduction
SRAM (briefly)
DRAM Organization
Challenges
Bandwidth
Granularity
Performance
4. 4 Structure of SRAM Cell Control logic
One memory cell per bit
Cell consists of one or more transistors
Not really a latch made of logic
Logic equivalent
5. 5 Bit Slice Cells connected to form 1 bit position
Word Select gates one latch from address lines
Note it selects Reads also
B (and B not) set by R/W, Data In and BitSelect
6. 6 Bit Slice can Become Module Basically bit slice is a x1 memory
Next
7. 7 16 X 1 RAM Now shows decoder
8. 8 Row/Column If RAM gets large, there is a large decoder
Impossibly large!
Also run into chip layout issues
Larger memories usually “2D” in a matrix layout
Next Slide
9. 9 16 X 1 as 4 X 4 Array Two decoders
Row
Column
Address just broken up
Not visible from outside
10. 10 Dynamic RAM Capacitor can hold charge
Transistor acts as gate
No charge is a 0
Can add charge to store a 1
Then open switch (disconnect)
Can read by closing switch
Explanation next
11. 11 Precharge and Sense Amps You’ll see “precharge time”
B is precharged to ˝ V
Charge/no-charge on C will increase or decrease voltage
Sense amps detect this
12. 12 DRAM Characteristics Destructive Read
When cell read, charge removed
Must be restored after a read
Refresh
Also, there’s steady leakage
Charge must be restored periodically
13. 13 DRAM Logical Diagram
14. 14 DRAM Refresh Many strategies w/ logic on chip
Here a row counter
15. 15 Timing Say need to refresh every 64ms
Distributed refresh
Spread refresh out evenly over 64ms
Say on a 4Mx4 DRAM, refresh every 64ms/4096=15.6 us
Total time spent is 0.25ms, but spread
Burst refresh
Same 0.25ms, but all at once
May not be good in a computer system
Refresh takes 1 % or less of total time
16. 16 Summary: DRAM vs. SRAM DRAM (Dynamic RAM)
Used mostly in main mem.
Capacitor + 1 transistor/bit
Need refresh every 4-8 ms
5% of total time
Read is destructive (need for write-back)
Access time < cycle time (because of writing back)
Density (25-50):1 to SRAM
Address lines multiplexed
pins are scarce! SRAM (Static RAM)
Used mostly in caches (I, D, TLB, BTB)
1 flip-flop (4-6 transistors) per bit
Read is not destructive
Access time = cycle time
Speed (8-16):1 to DRAM
Address lines not multiplexed
high speed of decoding imp.
17. 17 Chip Organization Chip capacity (= number of data bits)
tends to quadruple
1K, 4K, 16K, 64K, 256K, 1M, 4M, …
In early designs, each data bit belonged to a different address (x1 organization)
Starting with 1Mbit chips, wider chips (4, 8, 16, 32 bits wide) began to appear
Advantage: Higher bandwidth
Disadvantage: More pins, hence more expensive packaging
18. 18 Chip Organization Example: 64Mb DRAM
19. 19 Memory Performance Characteristics Latency (access time)
The time interval between the instant at which the data is called for (READ) or requested to be stored (WRITE), and the instant at which it is delivered or completely stored
Cycle time
The time between the instant the memory is accessed, and the instant at which it may be validly accessed again
Bandwidth (throughput)
The rate at which data can be transferred to or from memory
Reciprocal of cycle time
“Burst mode” bandwidth is of greatest interest
Cycle time > access time for conventional DRAM
Cycle time < access time in “burst mode” when a sequence of consecutive locations is read or written
20. 20 Improving Performance Latency can be reduced by
Reducing access time of chips
Using a cache (“cache trades latency for bandwidth”)
Bandwidth can be increased by using
Wider memory (more chips)
More data pins per DRAM chip
Increased bandwidth per data pin
21. 21 Two Recent Problems DRAM chip sizes quadrupling every three years
Main memory sizes doubling every three years
Thus, the main memory of the same kind of computer is being constructed from fewer and fewer DRAM chips
This results in two serious problems
Diminishing main memory bandwidth
Increasing granularity of memory systems
22. 22 Increasing Granularity of Memory Systems Granularity of memory system is the minimum memory size, and also the minimum increment in the amount of memory permitted by the memory system
Too large a granularity is undesirable
Increases cost of system
Restricts its competitiveness
Granularity can be decreased by
Widening the DRAM chips
Increasing the per-pin bandwidth of the DRAM chips
23. 23 Granularity Example
24. 24 Granularity Example (2)
25. 25 Improving Memory Chip Performance Several techniques to get more bits/sec from a DRAM chip:
Allow repeated accesses to the row buffer without another row access time
burst mode, fast page mode, EDO mode, …
Simplify the DRAM-CPU interface
add a clock to reduce overhead of synchronizing with the controller
= synchronous DRAM (SDRAM)
Transfer data on both rising and falling clock edges
double data rate (DDR)
Each of the above adds a small amount of logic to exploit the high internal DRAM bandwidth
26. 26 Block Diagram
27. 27 Activate Row
28. 28 Read (Select column)
29. 29 Basic Mode of Operation Slowest mode
Uses only single row and column address
Row access is slow (60-70ns) compared to column access (5-10ns)
Leads to three techniques for DRAM speed improvement
Getting more bits out of DRAM on one access given timing constraints
Pipelining the various operations to minimize total time
Segmenting the data in such a way that some operations are eliminated for a given set of accesses
30. 30 Nibble (or Burst) Mode Several consecutive columns are accessed
Only first column address is explicitly specified
Rest are internally generated using a counter
31. 31 Fast Page Mode Accesses arbitrary columns within same row
Static column mode is similar
32. 32 EDO Mode Arbitrary column addresses
Pipelined
EDO = Extended Data Out
Has other modes like “burst EDO”, which allows reading of a fixed number of bytes starting with each specified column address
33. 33 Evolutionary DRAM Architectures SDRAM (Synchronous DRAM)
Interface retains a good part of conventional DRAM interface
addresses multiplexed in two halves
separate data pins
two control signals
All address, data, and control signals are synchronized with an external clock (100-150 MHz)
Allows decoupling of processor and memory
Allows pipelining a series of reads and writes
Peak speed per memory module: 800-1200 MB/sec
34. 34 Synchronous DRAM (SDRAM) Common type in PCs since late-90s
Clocked
Addresses multiplexed in two halves
Burst transfers
Multiple banks
Pipelined
Start read in one bank after another
Come back and read the resulting values one after another
35. 35 DDR DRAM Double Data Rate SDRAM
Transfers data on both edges of the clock
Currently popular
DDRx, where x refers to voltage and signaling specs. DDR1 was 2.5v, DDR2 1,8v, DDR3 1.5v
Graphics cards now using GDDR4 (Graphics Double Data Rate) memory chips
Memory clocks of 900MHz or so (xfer 1800MHz equivalent)
36. 36 RAMBUS DRAM (RDRAM, XDR) RDRAM
Another attempt to alleviate pinout limits
Many (16-32), smaller banks per chip
Made to be read/written in packet protocol
Each chip has more of a controller
Did not do well in market. High latency.
XDR
A newer technology
Differential, low voltage swing signaling
Used in PS3, 65 GB/s xfer rate
37. 37 DRAM Controllers Very common to have circuit that controls memory
Handles banks
Handles refresh
Multiplexes column and row addresses
RAS and CAS timing
Northbridge on PC chip set
38. 38 Memory Interleaving Goal: Try to take advantage of bandwidth of multiple DRAMs in memory system
Memory address A is converted into (b,w) pair, where
b = bank index
w = word index within bank
Logically a wide memory
Accesses to B banks staged over time to share internal resources such as memory bus
Interleaving can be on
Low-order bits of address (cyclic)
b = A mod B, w = A div B
High-order bits of address (block)
Combination of the two (block-cyclic)
39. 39 Low-order Bit Interleaving
40. 40 Mixed Interleaving Memory address register is 6 bits wide
Most significant 2 bits give bank address
Next 3 bits give word address within bank
LSB gives (parity of) module within bank
6 = 0001102 = (00, 011, 0) = (0, 3, 0)
41 = 1010012 = (10, 100, 1) = (2, 4, 1)
41. 41 Other types of Memory ROM = Read-only Memory
Flash = ROM which can be written once in a while
Used in embedded systems, small microcontrollers
Offer IP protection, security
Other?