Cost effective hybrid storages
Download
1 / 24

Cost-effective Hybrid Storages - PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on

Cost-effective Hybrid Storages. Flash Group, Cao Qingling. Motivation. Dry up things with the least money !. Motivation. High cost, low density, low reliability. Replacement as HDD is not recommended, especially when the data volume is very large.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Cost-effective Hybrid Storages' - flann


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cost effective hybrid storages

Cost-effective Hybrid Storages

Flash Group, Cao Qingling


Motivation
Motivation

  • Dry up things with the least money!


Motivation1
Motivation

  • High cost, low density, low reliability.

  • Replacement as HDD is not recommended, especially when the data volume is very large.

  • Cache as HDD, make up the gap between RAM and HDD. Hurt the lifetime.

  • A permanent store at the same level as HDD, store some special data.


Introduction
Introduction

  • Put forward a Hybrid Hbase with SSD.

  • Storing system component of Hbase in SSD, which at the same level as HDD.

  • Perform quantitative assessment, Hybrid Hbase perform 1.5-2 times better.


Hbase
HBase

  • Column-based key-value store.

  • Each region server has a write-ahead log(WAL).

  • First write WAL and then in-memory memstore.

  • Region is a horizontal division.

  • A region could split.

  • Data on disks is stored as Log-structured merge(LSM) trees.


Hbase system component
HBase System Component

Zookeeper:

  • Clients contact it for -ROOT- table.

  • Master contacts it to know available region servers.

  • Region servers contact with it in a heartbeat keep- alive mechanism.

  • Zookeeper is I/O intensive.

    Catalog Tables:

  • -ROOT- and .META. Tables.

  • Mostly read intensive and are not updated frequently.


Hbase system component1
HBase System Component

Write-ahead-log(WAL):

  • Any write is first done on the WAL.

  • The size grows with: i) WAL committed; ii) write rate; iii) the size of key-value pair.

    Temporary Storage:

  • Used when a region is split or merged.

  • Sequentially read or write.


Assessment
Assessment

Price: 1:10

1% of the database size. Gain more than 10% performance.


Experimental evaluation
Experimental Evaluation

  • Experiment: Intel processor(4 cores and 4 threads at 3 GHz) with 8 GB RAM, Western Digital 1TB HDD, Kingston 128 GB SSD.

  • Yahoo! Cloud Serving Benching(YCSB).

  • Workloads: 100w queries on database with 6000w records. Record size is 1KB. Totally 72 regions.




Introduction1
Introduction

  • Approximate membership query data structure(AMQ). Bloom Filter.

  • Larger than RAM, performance decays.

  • Quotient Filter:better data locality, squential operations, available delete, dynamically resized, space-saving.

  • Buffered Quotient Filter(BQF) and Cascade Filter(CF) designed for flash.


Introduction2
Introduction

  • Approximate membership query data structure(AMQ). Bloom Filter.

  • Larger than RAM, performance decays.

  • Quotient Filter:better data locality, squential operations, available delete, dynamically resized, space-saving.

  • Buffered Quotient Filter(BQF) and Cascade Filter(CF) designed for flash.


Quotient filter
Quotient Filter

  • fr = f mod 2r

  • fq =

  • T[fq] = fr

  • Fingerprint: f = fq2r + fr.


Quotient filter1
Quotient Filter

  • is_occupied: check if fq = i, namely if T[i] has data.

  • is_shifted: if fr belongs to slot i.

  • is_continuation: if blongs to the same run with i-1.

run

Physical Storage


Quotient filter2
Quotient Filter

  • Check if f in the QA:

    step1:

    step2: to the beginning of the cluster.

    step3: to the start of the run.

    step4: search f.

  • Insert a f.

  • Delete a f.


Quotient filters on flash
Quotient Filters on Flash

  • Buffered Quotient Filter

    - BQF: one QF as the buffer, another on SSD.

    - Optimized for lookup performance.

  • Cascade Filter

    - Optimized for insertion.

    - Offer a trade off between lookup and insertion.


Quotient filter on flash
Quotient Filter on Flash

  • Cascade Filter

    - Based on cache-oblivious lookaheadarrary(COLA).



Conclusions
Conclusions

  • Bloom Filter has wide use in key-value storage.

  • Change the way of thinking.

  • Gain inspiration from traditional algorithms of database.

  • Design corresponding hybrid system by applications.


Bloom filter
Bloom Filter

  • Initial state:

  • Insert: H(1), H(b).

  • Can not expend, support no delete, poor data locality.

Back


ad