1 / 19

Scaling up analytical queries with column -stores

Scaling up analytical queries with column -stores. Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki. École Polytechnique Fédérale de Lausanne. Drinking from a data firehose. Fast and high quality data analysis for smart business decisions Data warehouses

gloria
Download Presentation

Scaling up analytical queries with column -stores

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling up analytical queries with column-stores Ioannis Alagiannis Manos Athanassoulis Anastasia Ailamaki ÉcolePolytechniqueFédérale de Lausanne

  2. Drinking from a data firehose • Fast and high quality data analysis for smart business decisions • Data warehouses • 1/3 of the database market ($$$) • Column-storesare here to stay! • Need for multiple concurrent users • 100s to 1000s queries* Many concurrent queries + column-stores = ??? *"High-performance data warehousing", TDWI best practices report

  3. Multiple concurrent queries pasta? steak? vegan? Find all restaurants with rating over 3.5 and close to East Village indian? DBMS CORE 2 CORE 2 CORE 1 CORE 1 CORE 4 CORE 4 CORE 3 CORE 3 CORE 5 CORE 5 CORE 6 CORE 6 CORE 8 CORE 8 CORE 7 CORE 7 MEM HDD High contention for resources

  4. response time throughput

  5. Throughput (memory-resident workload) TPCH (sf:30) saturation point total #HW contexts Concurrency can hurt performance

  6. Experimental setup • Column stores • System-A and System-B (Commercial) • System-C (Open-source) • Hardware • Dual socket Intel(R) Xeon(R) CPU E5-2660 • 2 sockets x 8 cores x 2 threads (32 HW contexts) • 128 GB RAM, 1600 MHz DIMMs • L1: 64KB and L2: 256KB (per core), L3: 20MB (shared)

  7. Workloads • TPC-H • Scale factor: 30 (32GB on disk) • Qtpch = {10 query templates} • SSB (Star Schema Benchmark) • Scale factor: 30 (18GB on disk) • Qssb= {all of 13 query templates} • Throughput exp. with 25 queryinstances Memory-resident Hot-runs

  8. Experiment 1: How does increased concurrency affect response time?

  9. Scaling up TPCH Q1 Linear increase in response time

  10. Scaling up SSB Q3.1 Similar behavior in SSB

  11. Experiment 2: What is the variability of query response time?

  12. Variability of System-A TPCH (64 clients) Groups of short, medium and long running queries

  13. Variability of System-B TPCH (64 clients) Balanced resource allocation  lower variation

  14. Variability of System-C TPCH (64 clients) System-C uses an admission control mechanism

  15. Experiment 3: How does increasing concurrency affect throughput?

  16. Throughput - TPCH 48% 32% drop 35% drop Throughput decreases after the saturation point

  17. Throughput - SSB throughput plateaus Exploiting sharing  sustain peak performance

  18. When concurrency in column-stores is increased: • Response time increases linearly • … with high variability • After saturation peak performance is not sustained Except from System-B for SSB

  19. Where do we go from here? • QPipe, Datapath, CJoin, ShareDB, Blink • Recycler (MonetDB), cooperative scans, CCM (cracking) • Adaptive resource (re)allocation • Work sharing techniques • Contention-aware scheduling saturation point Thank you!

More Related