1 / 22

Performance Tradeoffs in Read-Optimized Databases: from a Data Layout Perspective

Performance Tradeoffs in Read-Optimized Databases: from a Data Layout Perspective. Stavros Harizopoulos MIT CSAIL Modified by Jianlin Feng. massachusetts institute of technology. 1 Joe 45. 2 Sue 37. … … …. Read-optimized databases. 1. 2. Joe. …. SQL Server DB2 Oracle.

malik-rojas
Download Presentation

Performance Tradeoffs in Read-Optimized Databases: from a Data Layout Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Tradeoffs in Read-Optimized Databases:from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified by Jianlin Feng massachusetts institute of technology

  2. 1 Joe 45 2 Sue 37 … … … Read-optimized databases 1 2 Joe … SQL Server DB2 Oracle Sybase IQ MonetDB CStore Sue 45 … 37 … row stores column stores Read optimizations: Materialized views, multiple indices, compression How does column-orientation affect performance? massachusetts institute of technology

  3. Joe 45 Joe 45 reconstruct 45 project Joe 1 2 … Joe Sue 45 37 … 3 files … Rows vs. columns row data column data seek 1 Joe 45 2 Sue 37 single file … … … Study performance tradeoffs solely in data storage massachusetts institute of technology

  4. Target Questions (1) • As the number of columns accessed by a query increase, how does that affect the performance of a column store? • How is performance affected by the use of disk and L2 cache prefetching? • On a modern workstation, under what workloads are column and row stores I/O bound? massachusetts institute of technology

  5. Target Questions (2) • How do parameters such as selectivity, number of projected attributes, tuple width, and compression affect column store performance? • How are the relative performance tradeoffs of column and row stores affected by the presence of competition for I/O and memory bandwidth along with CPU cycles from competing queries? massachusetts institute of technology

  6. Performance study Methodology • Built both a row- and column-oriented storage manager from scratch • Measure their performance with an identical set of relational operators • i.e., no column-wise optimization • Mainly consider sequential scans on the fact table in a star-schema. • Analyze time spent in CPU, disk and memory massachusetts institute of technology

  7. Performance Consideration in Read-Optimized Databases • An important goal is to minimize the number of bytes read from the disk when scanning a relation. • For a given acess plan, two ways to achieve the goal • Minimize unnecessary data read. • Densepack a data page • Store data in a compressed form. massachusetts institute of technology

  8. Implementing a Read-Optimized Engine • Block-iterator operators • Single-threaded, C++, Linux AIO • No buffer pool • Use filesystem, bypass OS cache • Three major components • Disk Storage for Columns and Rows • Row and Column Table Scanners • Query Engine and I/O Architecture massachusetts institute of technology

  9. massachusetts institute of technology

  10. Compression methods • Dictionary • Bit-pack • Pack several attributes inside a 4-byte word • Use as many bits as max-value • Delta • Base value per page • Arithmetic differences • No Run-Length Encoding … ‘low’ … … ‘high’ … … ‘low’ … … ‘normal’ … … 00 … … 10 … … 00 … … 01 … massachusetts institute of technology

  11. massachusetts institute of technology

  12. I/O Architecture • Use the Asynchronous I/O (AIO) interface to implement • A non-blocking prefetching mechanism • Using the libaio library on Linux 2.6 • AIO performs reads at the granularity of an I/O unit of 128KB • Depth of prefetching • How many I/O units massachusetts institute of technology

  13. direct IO read 128 bytes 100ms read L2 cache prefetching 10ms seek Platform CPU L2 RAM DISKS (striped) 3.2 GB/sec 3.2GHz 180 MB/sec 1GB 1MB prefetching: massachusetts institute of technology

  14. Workload • LINEITEM (wide) • 60m rows → 9.5 GB • ORDERS (narrow) • 60m rows → 1.9 GB • Query 150 bytes 52 bytes 32 bytes 12 bytes SELECT a1, a2, a3, … WHERE a1 yields variable selectivity massachusetts institute of technology

  15. 25B 10B 69B text text text int 4B char 1B Wide tuple: 10% selectivity Column • Large prefetch hides disk seeks in columns Row time (sec) Column (CPU only) Row (CPU only) selected bytes per tuple massachusetts institute of technology

  16. time (sec) row store Wide tuple: 10% sel. (CPU) # attributes selected column store • Row-CPU suffers from memory stalls massachusetts institute of technology

  17. time (sec) row store Wide tuple: 10% sel. (CPU) 0.1% # attributes selected column store • Column-CPU efficiency with lower selectivity massachusetts institute of technology

  18. Narrow tuple: 10% selectivity • Memory stalls disappear in narrow tuples • Compression: similar to narrow (not shown) time (sec) row store column store selected bytes per tuple # attributes selected massachusetts institute of technology

  19. Varying prefetch size no competingdisk traffic • No prefetching hurts columns in single scans Column 2 Column 8 time (sec) Column 16 Column 48 (x 128KB) Row (any prefetch size) selected bytes per tuple massachusetts institute of technology

  20. Varying prefetch size no competingdisk traffic with competing disk traffic • No prefetching hurts columns in single scans • Under competing traffic, columns outperform rows for any prefetch size time (sec) selected bytes per tuple massachusetts institute of technology

  21. Conclusions • Given enough space for prefetching, columns outperform rows in most workloads • Competing traffic favors columns • Memory-bandwidth bottleneck in rows • Future work • Column scanners, random I/O, write performance massachusetts institute of technology

  22. References • Stavros Harizopoulos, Velen Liang, Daniel Abadi, and Samuel Madden.Performance Tradeoffs in Read-Optimized Databases. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, September 2006.   PDF  [354K]   PPT (Slides)  [340K] massachusetts institute of technology

More Related