1 / 19

Evaluating memory compression and deduplication

Evaluating memory compression and deduplication. Yuhui Deng, Liangshan Song, Xinyu Huang Department of Computer Science Jinan University. Agenda. Motivation Memory compression Memory deduplication Characteristics of memory data Evaluation Conclusions. Motivation.

kera
Download Presentation

Evaluating memory compression and deduplication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating memory compression and deduplication Yuhui Deng,Liangshan Song, Xinyu Huang Department of Computer Science Jinan University

  2. Agenda • Motivation • Memory compression • Memory deduplication • Characteristics of memory data • Evaluation • Conclusions

  3. Motivation • Many programs require more RAM (e.g. memory database). • For example, the maximal number of virtual machines that can run on a physical machine is in most cases limited by the amount of RAM space on that physical machine. • Furthermore, the runtime of programs that page or swap is likely to be dominated by the disk access time when the amount of physical RAM is less than what the programs require. • Therefore, we need more RAM!!!

  4. Data compression • Data compression transforms a string of characters into a new string that contains the same information but whose length is as small as possible. • The compression algorithms are categorized as lossless compression and lossy compression.

  5. Memory compression • Memory compression reserves some memory space that would normally be used directly by programs, • It compresses relatively unused memory pages, and stores the compressed pages in the reserved space. • This method enlarges effective memory space. • We employ six lossless compression algorithms (Arithmetic algorithm, Huffman algorithm, LZ77, LZ78, LZW, and RLE) to compress memory data.

  6. Data deduplication • Data deduplication involves chunking and deduplication detection. • The chunking phase splits data into non-overlapping data blocks (chunks). • The duplication detection phase detects if another chunk with exactly the same content has already been stored by using hash algorithms. • Chunking phase can be classified into four categories: Whole file chunking (WFC), Fixed-size partition (FSP), Content-defined Chunking (CDC), Sliding Block (SB).

  7. Memory deduplication • Memory deduplication periodically calculates a unique hash number for every physical memory page by using hash algorithms such as MD5 and SHA-1. • The calculated hash number is then compared against other existing hash numbers in a database that dedicates for storing page hash number. • If the hash number is already in the database, the memory page does not need to be stored again, a pointer to the first instance is inserted in place of the duplicated memory page. Otherwise, the new hash number is inserted into the database and the new memory page is stored.

  8. Evaluation environment Table 1. Features of memory page traces

  9. Table 2. Configuration of the experimental platform

  10. Statistic results Table 3. Statistic results of the page image traces • Entropy is normally employed as a measure of redundancy. • The entropy of a source means the average number of bits required to encode each symbol present in the source. • The compressibility grows with the decrease of the entropy value.

  11. The Zero column indicates that a large volume of memory data are zero bytes across seven traces. • The Continuous zero column summarizes the percentage of pages that contain continuous zeros longer than 32 bytes. • The Bound column includes the percentage of memory data which is continuous zeroes longer than 32 bytes, and the continuous zeroes start or end at a page boundary. • The Low columnshows percentage of memory data that are low values ranging between 1 and 9. • The Power(2,n) column implies the integral power-of-two values. It indicates the values of 2, 4, 8, 16, 32, 64, 128, 255 using decimal.

  12. Compression ratio • The compression ratio is defined as the size of compressed memory data divided by the size of uncompressed memory data. • The block size of this evaluation is 8KB that is equal to two memory pages. • It shows that the LZ algorithms (LZ77, LZ78, LZW) obtain the significant compression ratios (around 0.4), • gunplot trace achieves the best compression ratio across the six algorithms.

  13. Compression/decompression time Compression time decompression time • the compression time of LZ77 and decompression time of LZW are over 70 milliseconds and 50 milliseconds, respectively. • Not acceptable! • the latest Hitachi Ultrastar 15K------2 milliseconds • LZ78 strikes a good balance between compression ratio, compression time, and decompression time.

  14. Impact of block size (LZ78 ) (a)compression ratio(b)compression time (c) decompression time • It shows that the compression ratio decreases with the increase of block size (from 4Kbyte to 128Kbyte) across the seven traces. • Fig. (b) and (c) reveal that the bigger the block size is, the higher the compression and decompression time are. • This pattern is reasonable, because larger data block is more compressible and requires more time to compress and decompress. However, the performance decrease is not linearly proportional to the block size.

  15. Deduplication ratio This evaluation adopts three schemes including FSP, CDC, and SB. • It shows that FSP-4K and SB-4K achieve the best deduplication ratio across the seven traces. • When the chunking size of FSP is increased from 4Kbyte to 32Kbyte, the deduplication ratio is significantly increased. • The deduplication ratio of CDC is close to 1.

  16. Deduplication/restore time • From a compression ratio standpoint, FSP-4K and SB-4K are the best candidates to perform memory deduplication. • The deduplication time of SB-4K is about 40 times higher than that of the FSP-4K, although the restore time is comparable. • FSP-4K is the best candidate policy for memory deduplication. • Please note that the Y axis of the two figures is in microseconds.

  17. The chunking size has an opposite impact on the performance of compression and deduplication. • This is because the probability of those identical characters contained in a chunk grows with the increase of the chunk size, while the probability of two chunks that are exactly the same is decreased with the growth of the chunk size.

  18. Conclusion • Memory deduplication greatly outperforms memory block compression. • Fixed-size partition (FSP) achieves the best performance in contrast to Content-defined Chunking (CDC) and Sliding Block (SB). • The optimal chunking size of FSP is equal to the size of a memory page. • We believe, the analysis results in this paper should be able to provide useful insights for designing or implementing systems that require abundant memory resources to enhance the system performance.

  19. Thanks!

More Related