Adaptive Query Processing in the Looking Glass

Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)

Adaptive Query Processing (AQP) Systems:Publication Timeline STREAM Tukwila POP Re-Opt River Eddies Query Scrambling CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. Introduction

Motivation • Plenty of recent work on Adaptive Query Processing (AQP) in different contexts • Conventional DBMS query processing, data integration, continuous queries in stream systems • No exhaustive, in-depth categorization and comparison of AQP systems to date • Difficult to answer questions like: • Will techniques from one system work on another? • What are the shortcomings of each system? • Which system is best for a new application domain? Introduction

Our Contributions • Detailed study of current AQP systems • Classification of AQP systems into 3 families • Comparison across families in terms of AQP tasks • Identification of shortcomings & new approaches to address them Introduction

Roadmap • Introduction to AQP • The three AQP system families • Comparison across families in terms of AQP tasks • Summary of what we learned

Uses stats to cost plans Chosen plan Catalog (table sizes, histograms) Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats Primer on Traditional Query Processing Query Optimizer: Chooses best plan Introduction

Need for Adaptive Query Processing Detect plan suboptimality, re-optimize Errors in stats estimates, optimizer mistakes Correlated & skewed data distributions Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computing Introduction

Our Focus: AQP for a Single Query • AQP System: • A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single query Introduction

AQP System Families • Plan-based AQP systems • AQP for traditional plan-based DBMSs • Continuous-Query-based (CQ-based) AQP systems • AQP for long-running continuous queries over data streams • Routing-based AQP systems • AQP for DBMSs and continuous queries based on adaptive tuple routing AQP Families

Uses stats to cost plans + Extra operators Catalog (table sizes, histograms) Collected stats AQP in Plan-based Systems Query Optimizer: Chooses best plan Chosen plan Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats AQP Families

Uses stats to cost plans + Re-optimize Extra operators Catalog (Original + observed stats) Collected stats AQP in Plan-based Systems Query Optimizer: Chooses best plan Chosen plan Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats AQP Families

Example Plan-based AQP Systems STREAM Tukwila POP Re-Opt River Eddies Query Scrambling CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. AQP Families

Primer on Continuous Query Processing Chosen packets • Continuous Queries (CQs) are long-running queries usually over data streams • Example CQ: Filtering packet streams • Stream properties or system conditions may change while query is running  best plan may change σ3 σ2 σ1 Packets AQP Families

Uses stats to cost plans Chosen plan Catalog (table sizes, histograms) Executor: Runs chosen plan AQP in CQ-based Systems Query Optimizer: Chooses best plan Statistics Tracker: Creates/updates stats Runstats AQP Families

Uses stats to cost plans Chosen plan Executor: Runs chosen plan AQP in CQ-based Systems Continuous Query Optimizer: Chooses best plan Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions AQP Families

Chosen plan Executor: Runs chosen plan AQP in CQ-based Systems Continuous Query Optimizer: Ensures that plan is best for current stats Uses stats to cost plans Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions AQP Families

Chosen plan Executor: Runs chosen plan Combined in-part for efficiency AQP in CQ-based Systems Continuous Query Optimizer: Ensures that plan is best for current stats Uses stats to cost plans Re-optimize Catalog (stream rates, data distr.) Stats to track Statistics Tracker: Monitors stream stats and system conditions AQP Families

Example CQ-based AQP Systems STREAM Tukwila POP Re-Opt River Eddies Query Scrambling CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. AQP Families

Chosen packets Chosen packets σ3 σ2 σ3 σ2 σ1 Tuple Router σ1 Packets Packets Using tuple routing Using a plan Primer on Routing-based Processing • Non-plan-based architecture where tuples are routed individually through operators • No optimizer • Exemplified by Eddies [AH00] AQP Families

Uses stats to cost plans Chosen plan Catalog (table sizes, histograms) Executor: Runs chosen plan Runstats Statistics Tracker: Creates/updates stats AQP in Routing-based Systems Query Optimizer: Chooses best plan AQP Families

Uses stats to choose efficient routes Selective routing of tuples Chosen plan In-memory catalog (operator costs, selectivities, etc.) Executor: Pool of operators Executor: Runs chosen plan AQP in Routing-based Systems Query or Continuous Query Tuple Router: Integrated Optimizer & Stats Tracker AQP Families

Example Routing-based AQP Systems STREAM Tukwila POP Re-Opt River Query Scrambling Eddies CAPE NiagaraCQ Parametric opt. 1976 1977 … 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Pipeline sch. Ingres DEC-Rdb DQE Corrective processing RedBrick Conquest Expected cost opt. Memory adap. AQP Families

Comparison Across AQP System Families • Goal: To bring out AQP algorithms and features, not performance numbers • Models, assumptions, and approach • Techniques for tracking statistics • Re-optimization subtasks • When and how to re-optimize • Switching between plans • Pros & cons of using a conventional optimizer • Performance issues • Quality of re-optimization • Run-time overhead & thrashing • Scalability Comparison

Techniques for Tracking Statistics • Observation • Mostly in Plan-based systems • Competition • Mostly in Plan-based systems • Profiling • Mostly in CQ-based systems • Exploration • In Routing-based systems Comparison

Chosen packets σ3 σ2 Selectivity of s1 on input stream can be observed here σ1 Packets Tracking Statistics: Observation [KD98] • Collect statistics on operator behavior or intermediate subexpressions in a plan Comparison

Chosen packets σ3 σ2 Selectivity of s1 on input stream Selectivity of s2 on input stream σ1 σ2 Packets Tracking Statistics: Competition [A93] • Extra processing to collect statistics Comparison

Tracking Statistics: Profiling [BMM+04] • Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics • Builds a “statistical profile” that can be used to estimate many individual statistics σ3 σ2 σ1 Profiled tuples Comparison

Tracking Statistics: Exploration [AH00] Chosen packets • A fraction of tuples are routed along routes different from the current best route to track statistics along those routes • No redundant processing σ2 σ3 σ1 Tuple Router Packets Comparison

Comparing Statistics-Tracking Techniques: Extra Overhead Introduced • Observation • Exploration (inefficient routes for some tuples) Increasing overhead • Profiling (extra processing on some tuples) • Competition (lots of extra work) Comparison

Comparing Statistics-Tracking Techniques: Coverage of Different Statistics • Observation & Competition (limited by plan) Increasing coverage • Exploration (limited by large number of routes) • Profiling (highest since it builds statistics profile) Comparison

Comparing Statistics-Tracking Techniques: Accuracy of Estimation Increasing accuracy • Exploration (but, susceptible to routing bias) • Profiling (depends on sampling fraction) • Observation & Competition Comparison

⋈ s INLJ S Unclustered index R Example Query: sp1and p2 (R) S ⋈ What have we learned? (1) • Many similarities in internals of different AQP families • Can re-use many current (and new) AQP techniques across families • Ex: Profiling from CQ-based systems • Enables, e.g., faster detection of plan suboptimality in Plan-based systems • Generates more accurate statistics at lower cost in Routing-based systems New Ideas

⋈ ⋈ Example Query: sp1and p2 (R) S ⋈ s s Hash Join INLJ S INLJ S Cost Unclustered index R Hash Join R |σ(R)| What have we learned? (2) • Current AQP systems are reactive • E.g., do not consider sensitivity to errors/changes in stats Proactive Re-optimization New Ideas

What have we learned? (3) • Challenging meta problems in AQP for continuous queries need to be addressed • Larger and more complex plan spaces  higher costs for statistics tracking and re-optimization • Tracking “Return-of-Investment” on AQP • Avoiding thrashing, e.g., on bursty changes in statistics Proposal: Plan Logging for Continuous Queries New Ideas

P1 P2 (R,S) Rate(R) Plan Logging for Continuous Queries • Log the statistics and re-optimization history • Query is long-running • Example view over log for R S T ⋈ ⋈ time Plans lying in a high-dimensional space of statistics New Ideas

Summary • AQP is becoming important: • New data and application trends • CS-wide push towards Autonomic Computing • Significant amount of work on AQP in recent years • Our contributions: • In-depth categorization and comparison of AQP systems and techniques • Identified current shortcomings and new approaches to AQP Conclusions

Adaptive Query Processing in the Looking Glass

Adaptive Query Processing in the Looking Glass

Presentation Transcript

Eddies: Continuously Adaptive Query Processing

Adaptive Query Processing: Progress and Challenges

Adaptive Query Processing

Adaptive Query Processing in the Looking Glass

Adaptive Query Processing with Eddies

Adaptive Query Processing for Data Aggregation:

Query Processing

Query Processing

Adaptive Query Processing

Adaptive Query Processing

Adaptive Query Processing

DM211 Query Processing Enhancements in Adaptive Server Enterprise 12.0

BI515: Query Processing in Adaptive Server IQ

Adaptive Query Processing

Eddies: Continuously Adaptive Query Processing

Query Processing

Eddies: Continuously Adaptive Query processing

Adaptive Query Processing

Adaptive Query Processing: Progress and Challenges

Adaptive Query Processing in Data Stream Systems

Using State Modules for Adaptive Query Processing