Performance tradeoffs in read optimized databases
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Performance Tradeoffs in Read-Optimized Databases PowerPoint PPT Presentation


  • 52 Views
  • Uploaded on
  • Presentation posted in: General

Performance Tradeoffs in Read-Optimized Databases. Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden. massachusetts institute of technology. 1 Joe 45. 2 Sue 37. … … …. Read-optimized databases. 1. 2. Joe. …. SQL Server DB2 Oracle.

Download Presentation

Performance Tradeoffs in Read-Optimized Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Performance tradeoffs in read optimized databases

Performance Tradeoffs in Read-Optimized Databases

Stavros Harizopoulos

MIT CSAIL

joint work with:

Velen Liang, Daniel Abadi, and Sam Madden

massachusetts institute of technology


Read optimized databases

1 Joe 45

2 Sue 37

… … …

Read-optimized databases

1

2

Joe

SQL Server

DB2

Oracle

Sybase IQ

MonetDB

CStore

Sue

45

37

row stores

column stores

Read optimizations:

Materialized views, multiple indices, compression

How does column-orientation affect performance?

massachusetts institute of technology


Rows vs columns

Joe 45

Joe 45

reconstruct

45

project

Joe

1

2

Joe

Sue

45

37

3 files

Rows vs. columns

row data

column data

seek

1 Joe 45

2 Sue 37

single

file

… … …

Study performance tradeoffs solely in data storage

massachusetts institute of technology


Performance study

Performance study

  • Methodology

    • Built storage manager from scratch

    • Sequential scans

    • Analyze CPU, disk, memory

  • Findings

    • Columns are generally more I/O efficient

    • Competing traffic favors columns

    • Conditions where columns are CPU-constrained

    • Conditions where rows are MemBW-constrained

massachusetts institute of technology


Talk outline

Talk outline

  • System architecture

  • Workload and Experiments

  • Analysis

  • Conclusions

massachusetts institute of technology


System architecture

System architecture

  • Block-iterator operators

    • Single-threaded, C++, Linux AIO

  • No buffer pool

    • Use filesystem, bypass OS cache

  • Compression

  • Dense-pack

100% full

60% full

massachusetts institute of technology


Compression methods

Compression methods

  • Dictionary

  • Bit-pack

    • Pack several attributes inside a 4-byte word

    • Use as many bits as max-value

  • Delta

    • Base value per page

    • Arithmetic differences

… ‘low’ …

… ‘high’ …

… ‘low’ …

… ‘normal’ …

… 00 …

… 10 …

… 00 …

… 01 …

massachusetts institute of technology


Storage engine

Storage engine

row scanner

column scanner

SELECT name, age

WHERE age > 40

Joe 45

… …

Joe 45

… …

S

apply

predicate(s)

S

#POS 45

#POS …

name

apply

predicate #1

S

age

massachusetts institute of technology


Platform

direct IO

read 128 bytes

100ms

read

L2 cache

prefetching

10ms

seek

Platform

CPU

L2

RAM

DISKS

(striped)

3.2 GB/sec

3.2GHz

180 MB/sec

1GB

1MB

prefetching:

massachusetts institute of technology


Workload

Workload

  • LINEITEM (wide)

    • 60m rows → 9.5 GB

  • ORDERS (narrow)

    • 60m rows → 1.9 GB

  • Query

150 bytes

50 bytes

32 bytes

12 bytes

SELECT a1, a2, a3, …

WHERE a1 yields variable selectivity

massachusetts institute of technology


Wide tuple 10 selectivity

25B

10B

69B

text

text

text

int

4B

char

1B

Wide tuple: 10% selectivity

Column

  • Large prefetch hides disk seeks in columns

Row

time (sec)

Column (CPU only)

Row (CPU only)

selected bytes per tuple

massachusetts institute of technology


Wide tuple 10 sel cpu

time (sec)

row store

Wide tuple: 10% sel. (CPU)

# attributes selected

column store

  • Row-CPU suffers from memory stalls

massachusetts institute of technology


Wide tuple 10 sel cpu1

time (sec)

row store

Wide tuple: 10% sel. (CPU)

0.1%

# attributes selected

column store

  • Column-CPU efficiency with lower selectivity

massachusetts institute of technology


Narrow tuple 10 selectivity

Narrow tuple: 10% selectivity

  • Memory stalls disappear in narrow tuples

  • Compression: similar to narrow (not shown)

time (sec)

row store

column store

selected bytes per tuple

# attributes selected

massachusetts institute of technology


Varying prefetch size

Varying prefetch size

no competingdisk traffic

  • No prefetching hurts columns in single scans

Column 2

Column 8

time (sec)

Column 16

Column 48 (x 128KB)

Row (any prefetch size)

selected bytes per tuple

massachusetts institute of technology


Varying prefetch size1

Varying prefetch size

no competingdisk traffic

with competing disk traffic

  • No prefetching hurts columns in single scans

  • Under competing traffic, columns outperform rows for any prefetch size

time (sec)

selected bytes per tuple

massachusetts institute of technology


Analysis

Analysis

  • Central parameter in analysis:

    cycles per disk byte (cpdb)

  • What can it model:

    • More / fewer disks

    • More / fewer CPUs

    • CPU / disk competing traffic

  • Trends in cpdb:

    • 10 → 30 from 1995 to 2006

    • Further increase with multicore chips

  • massachusetts institute of technology


    Analysis1

    Analysis

    speedup of

    cols over rows

    • Rows favored by narrow tuples and low cpdb

      • Disk-bound workloads have higher cpdb

    2

    10% selectivity

    50% projection

    1.6 – 2

    1.2 – 1.6

    (cpdb)

    cycles per disk byte

    0.8 – 1.2

    0.4 – 0.8

    tuple width

    massachusetts institute of technology


    See our paper for the rest

    See our paper for the rest

    • CPU time breakdowns, L2 prefetcher

    • Disk prefetching implementation

    • Compression results

    • Non-pipelined column scanner

    • Analysis

    massachusetts institute of technology


    Conclusions

    Conclusions

    • Given enough space for prefetching, columns outperform rows in most workloads

    • Competing traffic favors columns

    • Memory-bandwidth bottleneck in rows

    • Future work

      • Column scanners, random I/O, write performance

    massachusetts institute of technology


    Performance tradeoffs in read optimized databases

    Thank you

    db.csail.mit.edu/projects/cstore

    massachusetts institute of technology


    Analysis2

    Analysis

    parameter

    what it can model

    massachusetts institute of technology


  • Login