1 / 20

Presentation on Compression Store

Presentation on Compression Store. Dec 1, 2011. Agenda. Overview of RAM cloud Overview of HANA Overview of CS- B+Tree compression utilities for in-memory compressed blocks Prototype on HDFS FUSE filesystem and working of LZ77. RAM Cloud.

Download Presentation

Presentation on Compression Store

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presentation on Compression Store Dec 1, 2011

  2. Agenda • Overview of RAM cloud • Overview of HANA • Overview of CS- B+Tree • compression utilities for in-memory compressed blocks • Prototype on HDFS • FUSE filesystem and working of LZ77

  3. RAM Cloud • Problem: Disk based system are not able to meet the need of large scale Web Applications • Solution: (3 solutions) - New approaches to disk based storage - Replacing disk with flash memory devices. - RAM Cloud

  4. RAM Cloud contd … • RAM Cloud stores the information in main memory • RAM Cloud uses 100, 1000 of servers to create large-scale storage system • Reduced Latency : Ram data is readily accessible, low latency than disk • Need for durability: Ram cloud uses replication, backup to provide durability

  5. RAM Cloud Concept (Scaling-Performance) • Applications • Generating web pages • Enforcing business rules • Storage • Shared storage for applications • Traditionally RDBMS, Files • New storage models, Big Table, memcached

  6. RAM Cloud different from memcache • Not a cache but the data store so no caching policy • Disk is used as backup device • Reduced latencies I/O is order of micro sec, whereas disk it is msec. • RAM Cloud typically will have around 64 GB of DRAM in each server

  7. Motivation for RAM Cloud(Application/Storage) • Motivation (Applications) • Single database cannot meet the needs popular web app • Data partitioned into multiple databases to mee throughput requirement • With increase of workload adhoc techniques have to be adopted. • Case study (Facebook) • Based on Aug 2009 Facebook deploys 2000 memcached servers on top of 4000 database servers to increase the throughput • DB is MySQL, 4000 DB servers, 50% of total DB servers

  8. Motivation (Storage) • New storage systems like BigTable, Dynamo, PNUTS solves scalability issues. • These technique give up the benefits of traditional DBs [ Greedy techniques] • RAM Cloud provides generic storage solution • Disk density has increase but access rate of the disk has not increased much • Disk going towards archival role [ Tapes] • Accessing large blocks of data from disk is fine • However, the size of large blocks is increasing, 100K was large size in 80s now it is 10 MB, which is likely to increase in future. • Only video data gets a better throughput with large disk blocks.

  9. Caching • The effect of caching is getting diluted with bigger data kept in RAM • Applications like Facebook have little or no locality due to complex links between data. • caching systems do not offer guaranteed performance it changes based on hit and miss ratio of cache. RAM Cloud though costs more but offers guaranteed performance. • Latency issues • DB queries that do not meet disk layout have to do numerous seeks ( Iterative searches like tree walks are very costly, disk access) • Specialized database architecture like array store, column store, stream processing engine have been developed to reduce latency for particular set of query • RAM Cloud offers a generic layout as it provides low latency. • Flash based technique • RAM Cloud can be made of FLASH, cheaper solution than DRAM • However, DRAM gives giher throughput compared to Flash so much better • New technlogy like phase change memeory that might be better than FLASH.

  10. Issues and disadvantage of RAM Cloud • Low latency RPC is required • Network switches are bottleneck (switch introduce delays) • OS overhead of processing the interrupts for network stack • Virtualization architecture makes it more slow. • Durability needs to be provided • Replicate all objects to memories of several machine. • Buffered logging could be one of the techniques where data can be written to disk as a log (Log structured filesystem based recovery) • Data Model for RAM based storage (3 aspects) • Nature of data objects like Blobs or a fix size record, data structure in C++ Java • How the basic objects are organized into higher level objects • Key value stores do not provide any aggregation • RDBMS, Rows are organized in to Table, indices can be built on table to enhance queries. • Mechanism for naming and indexing • RDBMS Rows are identified by a value, Primary key • Key value pair each object is identified by a key • RDBMS has scaling issues, no database that scales to 1000 servers, key value stores highly scalable but not feature rich like RDBMS.

  11. Contd .. • Proposed data model: Intermediate/Hybrid approach where the data type is BLOB (do not impose structure on data but support indexing), BLOB object are kept in tables. • Data placement: The new data that will be created needs to be placed in one of the servers in cluster. • Small tables should be kept in single server • Big table needs to be evenly balanced across servers. • Addressing Concurrency, transaction and consistency • ACID properties provided by RDBMS are not scalable • Bigtable does not involve transaction including more than one ROW. • Dynamo does not guarantee immediate and consistent update of replicas. • Due to reduced latency the execution of a transaction won’t be prohibitive , thus ACID could be scalable on RAM system ?

  12. HANA • Key enablers of SAP in-memory computing database • Large amount of addressable memory + growing processor cache. From 16GB DIMM to 32 GB DIMM, 24 MB to 30 MB processor cache. • Faster processing, clock cycles, Intel's Hyper Threading architecture from 8 cores to 10 cores • Faster processor interconnect between processor Intel’ QuickPath

  13. SAP in-memory computing database • Row-column store provide ACID guarantee • Calculation and planning engine with data repository • Data management services that include MDX and SQL interfaces

  14. HANA based on H. Plattner paper • Complex business requirement require transaction systems OLTP • Analytical and financial applications require OLAP systems • OLTP and OLAP systems are based on relational theory • OLTP system tuples are arranged in row, which are stored in blocks • Indexing allows faster access to tuples • Data access becomes slower, with increased number of tuples • OLAP is organized in star schemas • Optimization is to compress column with help o

  15. Contd .. • Column store is best suited for modern CPU • Enterprise application are memory bound • The vertical column compression has a better compression ratio than horizontal row compression • Row store cannot compete with column store • Based on “Cache Sensitive Search on B+ Tree” that stores all the children of a node contiguously, the child nodes are accessed by storing the address of first child and increasing the offset to get to subsequent child nodes.

  16. Analysis of compression Algorithm for data

More Related