memory expansion technology l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Memory eXpansion Technology PowerPoint Presentation
Download Presentation
Memory eXpansion Technology

Loading in 2 Seconds...

play fullscreen
1 / 42

Memory eXpansion Technology - PowerPoint PPT Presentation


  • 204 Views
  • Uploaded on

Memory eXpansion Technology. Krishan Swarup Gupta Rabie A. Ramadan Supervised By: Prof. El-Rewini. Memory eXpansion Technology Agenda. Introduction (Krishan ) Motivation (Krishan ) A Breakthrough (Krishan ) Requirements (Krishan ) Terminology (Rabie) Architecture (Rabie)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Memory eXpansion Technology


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
memory expansion technology

Memory eXpansion Technology

Krishan Swarup Gupta

Rabie A. Ramadan

Supervised By:

Prof. El-Rewini

memory expansion technology agenda

Memory eXpansion Technology Agenda

  • Introduction (Krishan )
  • Motivation (Krishan )
  • A Breakthrough (Krishan )
  • Requirements (Krishan )
  • Terminology (Rabie)
  • Architecture (Rabie)
  • Shared cache subsystem Requirements (Krishan )
  • C-RAM Architecture (Krishan )
  • Compression technique (Rabie)
  • Main Memory subsystem (Rabie)
  • Operating System Software (Rabie)
  • Performance (Rabie)
introduction

Introduction

“Adding memory is often the most effective way to improve system performance, but it's a costly proposition,"

Mark Dean, IBM Fellow and Vice President of Systems Research.

slide4

Introduction

  • MXT is a hardware technology for compressing main memory contents.
  • MXT doubles the effective size of the main memory.
  • 512 MB installed memory appears as 1 GB.
  • This is done entirely in hardware, transparent to the CPUs, I/O devices, peripherals and all software including apps, device drivers and the kernel with the exception of less than hundred lines of code additions to the base kernel.
slide5

Motiviation

  • Memory seems to be cheap, But is not , especially when the system uses 512 MB or more.
  • Why bother with MXT to double the size of memory?
  • Simple ! , MXT saves money and lots of money.
  • Try the on-line price configurations of Compaq, IBM, Dell, etc to double the size of the system memory.
slide6

A Breakthrough

  • The Large Technology Installations can save millions of dollars.
  • The savings can be significant for both small and large customers, as memory comprises 40-70 percent of the total cost of most NT-based server configurations.
  • MXT is a hardware implementation that automatically stores frequently accessed data and instructions close to a computer's microprocessors so they can be accessed immediately.
  • MXT incorporates a new level of cache that is designed to efficiently handle data and instructions on a memory controller chip.
  • It is real and implemented by IBM eServer x330 with MXT released on 11th Feb 2002.
slide7

Requirements

  • Very fast compression/decompression h/w is required permitting operations at main-memory bandwidth.
  • Since with compression, the logical total main-memory size may vary dynamically, changes in memory management must be made to the operating system.
  • A way must be found to efficiently store and access the variable-length objects obtained from compression.
slide8

Terminology

ASCI Machine

  • ASCI is the US Department of Energy's Accelerated Strategic Computing Initiative, a collaboration between three US national defense laboratories
  • Aim to give researchers the five-order-of-magnitude increase in computing performance over current technology
  • MIMD distributed memory
  • message-passing supercomputer
  • The architecture is scalable
    • communication bandwidth
    • main memory
    • internal disk storage capacity
    • I/O
terminologies ram
Terminologies RAM
  • Conventional DRAM
  • Synchronous DRAM (SDRAM)
  • DDR SDRAM
  • SIMM
  • DIMMS
  • Interleaving
slide10

MXT Architecture

  • A Collection of processors are connected to a common SDRAM-based main memory through a memory controller chip.
  • MXT incorporates the two level of architecture consisting of a large shared cache coupled with a typical main memory.
  • Three ways to manage memory :
    • Organizing M to be a linear space, where variable-length intervals are allocated and deallocated.
    • Organizing M as a collection of blocks of possibly multiple sizes, where space for a variable-length object is allocated as an integral number of such blocks.
    • Organizing M as a collection of blocks, but permitting a variable amount of space to be allocated within a block.
cyclic redundancy code crc
Cyclic Redundancy Code (CRC)
  • A number derived from a data block
  • A CRC is more complicated than a checksum
  • Calculated using division by using shifts and exclusive ORs
  • Generator Polynomial
  • CRCs treat blocks of input bits as coefficient-sets for polynomials
    • EX. 10100000

1*x7 + 0*x6 + 1*x5 + 0*x4 + 0*x3 + 0*x2 + 0*x1 + 0*x0

  • The reminder of the Division is the checksum
  • For More Info. Please visit this web site
  • http://www.4d.com/docs/CMU/CMU79909.HTM
slide12

Processor

Processor

Processor

Cache (L1)

Cache (L1)

Cache (L1)

Cache (L2)

Cache (L2)

Cache (L2)

Shared L3 Cache

comp/decomp

Compressed Main Memory

slide14

Shared Cache Subsystem

  • The shared cache L3 provides low-latency processor and I/O subsystem access to frequently accessed uncompressed data.
  • The cache is partitioned into a quantity of lines called cache lines, with each line an associative storage unit equivalent in size to the 1KB uncompressed data block size.
  • A cache directory is used to keep track of real-memory tag address which correspond to the cached address that can be stored within the line.
slide15

Shared Cache Subsystem

  • Three primary architecture :-
  • The independent cache array scheme
    • Large independent data-cache memory is implemented using low cost double-data-rate (SDRAM) technology.
    • Outside the memory controller chip, while the associated cache directory is implemented on the chip.
    • The cache size is limited primarily by the size of the cache directory.
    • Cache interface can be optimized for the lowest-latency access by the processor.
memory expansion technology16

Memory eXpansion Technology

Shared cache subsystem

  • The compressed main memory partition scheme
    • The cache controller and the memory controller share the same storage array via the same physical interface.
    • Data is shuttled back and forth between compressed main memory region and uncompressed cache through the compression hardware during cache line replacement.
    • Compressed cache size can be readily optimized to specific system application.
    • Contention for main memory physical interface by latency-sensitive cache controller.
memory expansion technology17

Memory eXpansion Technology

Shared cache subsystem

  • The distributed cache scheme
    • The cache is distributed throughout the compressed memory as a number of uncompressed lines. Only the most recently used n lines are selected to make up the cache.
    • Data is shuttled in and out of the compressed memory, changing the compressed state as it is passed through the compression logic during cache-line replacement.
    • Effective cache size may be dynamically optimized during system operation by simply changing the maximum number of uncompressed lines.
    • Contention for main memory physical interface.
    • Greater average latency associated with the cache directory references.
memory expansion technology18

Memory eXpansion Technology

C-RAM Architecture

  • Logically, the memory M consists of a collection of randomly accessible fixed-size lines, where L is the line size.
  • Internally, the ith line is stored in a compressed format as L(i) bytes, where L(i) <= L, and where L(i) may change on each cache cast-out of this line.
memory expansion technology19

Memory eXpansion Technology

C-RAM Architecture

  • M comprises a standard random-access memory with a minimum access size (granule) of g bytes. We will generally assume that g is 32.
  • Memory accesses invoke a translation between a logical line address and an internal address. This correspondence is stored in a directory D contained in M.
  • Translation, fetching, and memory management within the C-RAM are carried out by a memory controller rather than by operating system (OS) software.
slide20

L3 and C-RAM organization

L3

L3 Directory

L3 Cache Lines

Miss

Store

Decompressor

Compressor

Address

Read

Write

Line 2

A1

Line1

M

Line 3

A2

Line2

A3

Line3

Line4

A4

Blocks

Line 4

memory expansion technology21

Memory eXpansion Technology

C-RAM Architecture

  • Each directory Entry contains :
    • Flags.
    • Fragment combining information.
    • Pointers for up to four block.
  • On an L3 cache miss, the memory controller and decompression h/w find the blocks allocated to store the compressed line and dynamically decompress the line to handle the miss.
memory expansion technology22

Memory eXpansion Technology

C-RAM Architecture

  • When a new or modified line is stored, the blocks currently allocated to the line are made free, and the line is then compressed and stored in the C-RAM by allocating the required number of blocks.
memory expansion technology23

Memory eXpansion Technology

C-RAM Architecture

  • Example
  • Pages size is 4KB. L3 cache immediately above C-RAM has line size of 1KB. Each line compresses to 1, 2, 3, 4…, or 1024 bytes with equal likelyhood.
  • Expected compressed line size would be 512.5 bytes. This yields to 50.5% compression.
  • But the problem is ?????????????
  • Block size is 256-bytes
  • FRAGMENTATION “left over space in the block”
memory expansion technology24

Memory eXpansion Technology

C-RAM Architecture

  • Approaches dealing with fragmentation problem :
    • Make block size smaller. Size of directory entry will increase dramatically.
    • Combine two or more fragments, that is, the “left-over” pieces in the last blocks used to store compressed lines, into single blocks.
  • The set of lines for which fragment combining is allowed is called “cohort”.
memory expansion technology25

Memory eXpansion Technology

C-RAM Architecture

  • Cohort size : to have a small upper bound on the time required for directory scans, ideally the size of cohorts should be small.
  • Two ways in which the cohort are determined.
  • Partitioned cohort :
  • Lines are divided into a number of disjoint sets, where each such set is a cohort. For example : with a cohort of size 2, the first two 1KB lines in each 4KB page could form one cohort and the last two lines another cohort.
memory expansion technology26

Memory eXpansion Technology

C-RAM Architecture

  • Sliding cohort :
  • Cohorts are not disjoint, but overlap. For example, with a cohort of size 4, the cohort corresponding to any given line could consist of the set containing that line and the previous three lines, and similarly for other cohort sizes. Less fragmentation then partitioned cohort.
memory expansion technology27

Memory eXpansion Technology

C-RAM Architecture

  • The mathod by which fragments are combined
    • The number of fragments that can be combined into a block.
    • 2 way combining (2 fragments per block)
    • 3 way combining (3 fragments per block)
  • Which fragment (or fragments) to choose.
    • First fit, Best fit
    • Fragment Contention, Optimal Fit
memory expansion technology28

Memory eXpansion Technology

C-RAM Architecture

  • Design Of Directory Structure :
    • Static Directory :
      • It is configured so as to have the required number of entries to support a maximum compression factor of F. That is, if the C-RAM has a capacity of N uncompressed lines, the directory contains entries for FN lines.
      • A possible problem with this type of design is that the maximum compression is limited to a predetermined value.
memory expansion technology29

Memory eXpansion Technology

C-RAM Architecture

  • Dynamic Directory :
    • Using a dynamic directory structure, directory entries are created (deleted) whenever real addresses are allocated (deallocated). In this case, free main-memory blocks could be allocated (deallocated) and used for the directory entries for one or more pages whenever the pages were created (deleted).
slide30
XMT

Main Memory Subsystem

lz77 compression technique
LZ77 Compression Technique
  • The LZ77 output is a series of byte values intersperse with (index, length) pairs. Each byte value is written as is to the output. The (index, length) pairs are written to the output as a pair of integers (index first, then length) each of which has 256 added to the value. This allows for the index and length values to be distinguished from the byte values.
  • LZ77 in operation
ibm implementation of the compression technique
IBM Implementation of the compression technique
  • Divide the data into n partitions
  • A compression engine for each part
  • Shared dictionary
  • Typically
    • 4 compression engines
    • 256 B ( a quarter of 1KB uncompressed data)
    • (1B/ cycle  4B/ cycle ) or
    • ( 2B/cycle 8B/cycle) when double clocked.
uncompressed memory
Uncompressed Memory
  • Unescorted region is used by SST for additional and future needs.
main memory subsystem
Main Memory Subsystem
  • Comprises SDRAM and Dual in-line Memoery Modules DIMMs
  • The controller supports two separate DIMMs
  • Can be configured to operate with compression disabled, enabled for specific address ranges, or completely enables.
  • Sector Translation Table
  • Sectored Memory
slide36
Cont.
  • Data
    • 1KB <= 120 bits Compression – Stored in SST
    • 1KB > 120 bits Compression – Pointer to the sector
    • Uncompressed
    • Directly accessed without SST reference
reliability availability serviceability ras
Reliability-Availability-Serviceability (RAS)
  • Sector translation table entry parity checking.
  • Sector free-list parity checking.
  • Sector out-of-range checking.
  • Sectored memory-overrun detection.
  • Sectors-used threshold detection (2).
  • Compressor/decompressor validity checking.
  • Compressed-memory CRC protection.
commodity duplex memory
Commodity Duplex Memory
  • Fault tolerance technique – never found before
operating system software
Operating System Software
  • Can not distinguish between XMT and Conventional Memory HW Environment
  • When the memory over utilized the system fails
    • Unsectored memory
    • Needs paging management
  • In UNIX needs to change OS kernel
  • In Windows , the code is not public, needs external driver software
references
References
  • MXT

1- High-throughput coherence control and hardware messaging in EverestA. K. Nanda, A.-T. Nguyen, M. M. Michael, and D. J. Josephp. 229

2- Algorithms and data structures for compressed-memory machinesP. A. Franaszek, P. Heidelberger, D. E. Poff, and J. T. Robinsonp. 245

3- On internal organization in compressed random-access memoriesP. A. Franaszek and J. T. Robinsonp. 259 IBM Memory Expansion Technology (MXT)R. B. Tremaine, P. A. Franaszek, J. T. Robinson, C. O. Schulz, T. B. Smith, M. E. Wazlowski, and P. M. Blandp. 271

4- Memory Expansion Technology (MXT): Software support and performanceB. Abali, H. Franke, D. E. Poff, R. A. Saccone, Jr., C. O. Schulz, L. M. Herger, and T. B. Smithp. 287

5- Memory Expansion Technology (MXT): Competitive impactT. B. Smith, B. Abali, D. E. Poff, and R. B. Tremainep. 303

  • Memory Compression

http://domino.research.ibm.com/comm/wwwr_thinkresearch.nsf/pages/memory200.html

  • Memory Guide

http://www.pcguide.com/ref/ram/tech.htm