performance tradeoffs in read optimized databases
Download
Skip this Video
Download Presentation
Performance Tradeoffs in Read-Optimized Databases

Loading in 2 Seconds...

play fullscreen
1 / 22

Performance Tradeoffs in Read-Optimized Databases - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Performance Tradeoffs in Read-Optimized Databases. Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden. massachusetts institute of technology. 1 Joe 45. 2 Sue 37. … … …. Read-optimized databases. 1. 2. Joe. …. SQL Server DB2 Oracle.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Performance Tradeoffs in Read-Optimized Databases' - iokina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
performance tradeoffs in read optimized databases

Performance Tradeoffs in Read-Optimized Databases

Stavros Harizopoulos

MIT CSAIL

joint work with:

Velen Liang, Daniel Abadi, and Sam Madden

massachusetts institute of technology

read optimized databases

1 Joe 45

2 Sue 37

… … …

Read-optimized databases

1

2

Joe

SQL Server

DB2

Oracle

Sybase IQ

MonetDB

CStore

Sue

45

37

row stores

column stores

Read optimizations:

Materialized views, multiple indices, compression

How does column-orientation affect performance?

massachusetts institute of technology

rows vs columns

Joe 45

Joe 45

reconstruct

45

project

Joe

1

2

Joe

Sue

45

37

3 files

Rows vs. columns

row data

column data

seek

1 Joe 45

2 Sue 37

single

file

… … …

Study performance tradeoffs solely in data storage

massachusetts institute of technology

performance study
Performance study
  • Methodology
    • Built storage manager from scratch
    • Sequential scans
    • Analyze CPU, disk, memory
  • Findings
    • Columns are generally more I/O efficient
    • Competing traffic favors columns
    • Conditions where columns are CPU-constrained
    • Conditions where rows are MemBW-constrained

massachusetts institute of technology

talk outline
Talk outline
  • System architecture
  • Workload and Experiments
  • Analysis
  • Conclusions

massachusetts institute of technology

system architecture
System architecture
  • Block-iterator operators
    • Single-threaded, C++, Linux AIO
  • No buffer pool
    • Use filesystem, bypass OS cache
  • Compression
  • Dense-pack

100% full

60% full

massachusetts institute of technology

compression methods
Compression methods
  • Dictionary
  • Bit-pack
    • Pack several attributes inside a 4-byte word
    • Use as many bits as max-value
  • Delta
    • Base value per page
    • Arithmetic differences

… ‘low’ …

… ‘high’ …

… ‘low’ …

… ‘normal’ …

… 00 …

… 10 …

… 00 …

… 01 …

massachusetts institute of technology

storage engine
Storage engine

row scanner

column scanner

SELECT name, age

WHERE age > 40

Joe 45

… …

Joe 45

… …

S

apply

predicate(s)

S

#POS 45

#POS …

name

apply

predicate #1

S

age

massachusetts institute of technology

platform

direct IO

read 128 bytes

100ms

read

L2 cache

prefetching

10ms

seek

Platform

CPU

L2

RAM

DISKS

(striped)

3.2 GB/sec

3.2GHz

180 MB/sec

1GB

1MB

prefetching:

massachusetts institute of technology

workload
Workload
  • LINEITEM (wide)
    • 60m rows → 9.5 GB
  • ORDERS (narrow)
    • 60m rows → 1.9 GB
  • Query

150 bytes

50 bytes

32 bytes

12 bytes

SELECT a1, a2, a3, …

WHERE a1 yields variable selectivity

massachusetts institute of technology

wide tuple 10 selectivity

25B

10B

69B

text

text

text

int

4B

char

1B

Wide tuple: 10% selectivity

Column

  • Large prefetch hides disk seeks in columns

Row

time (sec)

Column (CPU only)

Row (CPU only)

selected bytes per tuple

massachusetts institute of technology

wide tuple 10 sel cpu

time (sec)

row store

Wide tuple: 10% sel. (CPU)

# attributes selected

column store

  • Row-CPU suffers from memory stalls

massachusetts institute of technology

wide tuple 10 sel cpu1

time (sec)

row store

Wide tuple: 10% sel. (CPU)

0.1%

# attributes selected

column store

  • Column-CPU efficiency with lower selectivity

massachusetts institute of technology

narrow tuple 10 selectivity
Narrow tuple: 10% selectivity
  • Memory stalls disappear in narrow tuples
  • Compression: similar to narrow (not shown)

time (sec)

row store

column store

selected bytes per tuple

# attributes selected

massachusetts institute of technology

varying prefetch size
Varying prefetch size

no competingdisk traffic

  • No prefetching hurts columns in single scans

Column 2

Column 8

time (sec)

Column 16

Column 48 (x 128KB)

Row (any prefetch size)

selected bytes per tuple

massachusetts institute of technology

varying prefetch size1
Varying prefetch size

no competingdisk traffic

with competing disk traffic

  • No prefetching hurts columns in single scans
  • Under competing traffic, columns outperform rows for any prefetch size

time (sec)

selected bytes per tuple

massachusetts institute of technology

analysis
Analysis
  • Central parameter in analysis:

cycles per disk byte (cpdb)

  • What can it model:
      • More / fewer disks
      • More / fewer CPUs
      • CPU / disk competing traffic
  • Trends in cpdb:
      • 10 → 30 from 1995 to 2006
      • Further increase with multicore chips

massachusetts institute of technology

analysis1
Analysis

speedup of

cols over rows

  • Rows favored by narrow tuples and low cpdb
    • Disk-bound workloads have higher cpdb

2

10% selectivity

50% projection

1.6 – 2

1.2 – 1.6

(cpdb)

cycles per disk byte

0.8 – 1.2

0.4 – 0.8

tuple width

massachusetts institute of technology

see our paper for the rest
See our paper for the rest
  • CPU time breakdowns, L2 prefetcher
  • Disk prefetching implementation
  • Compression results
  • Non-pipelined column scanner
  • Analysis

massachusetts institute of technology

conclusions
Conclusions
  • Given enough space for prefetching, columns outperform rows in most workloads
  • Competing traffic favors columns
  • Memory-bandwidth bottleneck in rows
  • Future work
    • Column scanners, random I/O, write performance

massachusetts institute of technology

slide21

Thank you

db.csail.mit.edu/projects/cstore

massachusetts institute of technology

analysis2
Analysis

parameter

what it can model

massachusetts institute of technology

ad