Performance tradeoffs in read optimized databases
Sponsored Links
This presentation is the property of its rightful owner.
1 / 22

Performance Tradeoffs in Read-Optimized Databases PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Performance Tradeoffs in Read-Optimized Databases. Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden. massachusetts institute of technology. 1 Joe 45. 2 Sue 37. … … …. Read-optimized databases. 1. 2. Joe. …. SQL Server DB2 Oracle.

Download Presentation

Performance Tradeoffs in Read-Optimized Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Performance Tradeoffs in Read-Optimized Databases

Stavros Harizopoulos

MIT CSAIL

joint work with:

Velen Liang, Daniel Abadi, and Sam Madden

massachusetts institute of technology


1 Joe 45

2 Sue 37

… … …

Read-optimized databases

1

2

Joe

SQL Server

DB2

Oracle

Sybase IQ

MonetDB

CStore

Sue

45

37

row stores

column stores

Read optimizations:

Materialized views, multiple indices, compression

How does column-orientation affect performance?

massachusetts institute of technology


Joe 45

Joe 45

reconstruct

45

project

Joe

1

2

Joe

Sue

45

37

3 files

Rows vs. columns

row data

column data

seek

1 Joe 45

2 Sue 37

single

file

… … …

Study performance tradeoffs solely in data storage

massachusetts institute of technology


Performance study

  • Methodology

    • Built storage manager from scratch

    • Sequential scans

    • Analyze CPU, disk, memory

  • Findings

    • Columns are generally more I/O efficient

    • Competing traffic favors columns

    • Conditions where columns are CPU-constrained

    • Conditions where rows are MemBW-constrained

massachusetts institute of technology


Talk outline

  • System architecture

  • Workload and Experiments

  • Analysis

  • Conclusions

massachusetts institute of technology


System architecture

  • Block-iterator operators

    • Single-threaded, C++, Linux AIO

  • No buffer pool

    • Use filesystem, bypass OS cache

  • Compression

  • Dense-pack

100% full

60% full

massachusetts institute of technology


Compression methods

  • Dictionary

  • Bit-pack

    • Pack several attributes inside a 4-byte word

    • Use as many bits as max-value

  • Delta

    • Base value per page

    • Arithmetic differences

… ‘low’ …

… ‘high’ …

… ‘low’ …

… ‘normal’ …

… 00 …

… 10 …

… 00 …

… 01 …

massachusetts institute of technology


Storage engine

row scanner

column scanner

SELECT name, age

WHERE age > 40

Joe 45

… …

Joe 45

… …

S

apply

predicate(s)

S

#POS 45

#POS …

name

apply

predicate #1

S

age

massachusetts institute of technology


direct IO

read 128 bytes

100ms

read

L2 cache

prefetching

10ms

seek

Platform

CPU

L2

RAM

DISKS

(striped)

3.2 GB/sec

3.2GHz

180 MB/sec

1GB

1MB

prefetching:

massachusetts institute of technology


Workload

  • LINEITEM (wide)

    • 60m rows → 9.5 GB

  • ORDERS (narrow)

    • 60m rows → 1.9 GB

  • Query

150 bytes

50 bytes

32 bytes

12 bytes

SELECT a1, a2, a3, …

WHERE a1 yields variable selectivity

massachusetts institute of technology


25B

10B

69B

text

text

text

int

4B

char

1B

Wide tuple: 10% selectivity

Column

  • Large prefetch hides disk seeks in columns

Row

time (sec)

Column (CPU only)

Row (CPU only)

selected bytes per tuple

massachusetts institute of technology


time (sec)

row store

Wide tuple: 10% sel. (CPU)

# attributes selected

column store

  • Row-CPU suffers from memory stalls

massachusetts institute of technology


time (sec)

row store

Wide tuple: 10% sel. (CPU)

0.1%

# attributes selected

column store

  • Column-CPU efficiency with lower selectivity

massachusetts institute of technology


Narrow tuple: 10% selectivity

  • Memory stalls disappear in narrow tuples

  • Compression: similar to narrow (not shown)

time (sec)

row store

column store

selected bytes per tuple

# attributes selected

massachusetts institute of technology


Varying prefetch size

no competingdisk traffic

  • No prefetching hurts columns in single scans

Column 2

Column 8

time (sec)

Column 16

Column 48 (x 128KB)

Row (any prefetch size)

selected bytes per tuple

massachusetts institute of technology


Varying prefetch size

no competingdisk traffic

with competing disk traffic

  • No prefetching hurts columns in single scans

  • Under competing traffic, columns outperform rows for any prefetch size

time (sec)

selected bytes per tuple

massachusetts institute of technology


Analysis

  • Central parameter in analysis:

    cycles per disk byte (cpdb)

  • What can it model:

    • More / fewer disks

    • More / fewer CPUs

    • CPU / disk competing traffic

  • Trends in cpdb:

    • 10 → 30 from 1995 to 2006

    • Further increase with multicore chips

  • massachusetts institute of technology


    Analysis

    speedup of

    cols over rows

    • Rows favored by narrow tuples and low cpdb

      • Disk-bound workloads have higher cpdb

    2

    10% selectivity

    50% projection

    1.6 – 2

    1.2 – 1.6

    (cpdb)

    cycles per disk byte

    0.8 – 1.2

    0.4 – 0.8

    tuple width

    massachusetts institute of technology


    See our paper for the rest

    • CPU time breakdowns, L2 prefetcher

    • Disk prefetching implementation

    • Compression results

    • Non-pipelined column scanner

    • Analysis

    massachusetts institute of technology


    Conclusions

    • Given enough space for prefetching, columns outperform rows in most workloads

    • Competing traffic favors columns

    • Memory-bandwidth bottleneck in rows

    • Future work

      • Column scanners, random I/O, write performance

    massachusetts institute of technology


    Thank you

    db.csail.mit.edu/projects/cstore

    massachusetts institute of technology


    Analysis

    parameter

    what it can model

    massachusetts institute of technology


  • Login