1 / 21

The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm

The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm. Elisa Bertino Purdue University bertino@cs.purdue.edu. Rimma V. Nehme Microsoft Jim Gray Systems Lab rimman@microsoft.com. Elke. A. Rundensteiner Worcester Polytechnic Institute rundenst@cs.wpi.edu.

oma
Download Presentation

The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Query Mesh Project: A Powerful Multi-Route Query Processing Paradigm Elisa Bertino Purdue University bertino@cs.purdue.edu Rimma V. Nehme Microsoft Jim Gray Systems Lab rimman@microsoft.com Elke. A. Rundensteiner Worcester Polytechnic Institute rundenst@cs.wpi.edu New England Database Summit 2010 1 Thanx goes to NSF 0917017 for partial support of this project.

  2. Motivation • A variety of modern applications face data with non-uniform characteristics • ubiquitous healthcare, location-based services, financial tickers, network monitoring… TYPICALLY ONE execution plan for ALL DATA Database Engine Data Sources Plan Cost Query Executor 1.234 Overall Statistics Query Execution Plan Query Results I want my results quickly. I don’t care how exactly they are computed Data Query Optimizer SELECT * FROM … Query 2

  3. Concrete Example: Network Monitoring • Here example is with streaming data • Similar examples can be found with static data Network Monitoring Opportunity for Improvement: It may be more efficient to use different plansfor different subsets of data Multi-Plan (/Route) Query Processing Single Plan Query Processing Plan 1 Plan 2 Plan 3 Network packets Query Results Query Execution Plan Data Streams Query Optimizer DSMS SELECT * FROM … Continuous Query 3

  4. Outline • Introduction & Motivation • Background : Query Mesh • Model • Optimization • Execution • Dynamic Re-Optimization with Query Mesh • Challenges • Architecture • Details • Experimental Evaluation • Ongoing and future work • Conclusion 4

  5. Multi-Plan Query Processing Using Query Mesh Physical Architecture of Query Mesh Framework Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems Traditional Query Optimization Eddies and its descendants Eddy Single “route-oriented” solution Multi “route-less” solution Fine-granularity optim. Coarse optimization Query Mesh Significant overhead Small overhead … … … Classifier Multiple routes Multi “route-oriented” solution Fine-granularity optimization Less overhead (Here, route = execution plan) 5

  6. Query Mesh Search Space Search Space: the set of all possible solutions 1234 One plan for all data Search Space Complexity Bell number Bn = sum of Stirling numbers of second kind S(n,k) Stirling number of the second kind S(n, k) is the number of ways to partition a set of cardinality n into exactly k nonempty subsets 14/23 1/234 124/3 13/24 123/4 134/2 12/34 Query Mesh Lattice Shaped Search Space 1/23/4 14/2/3 1/24/3 13/2/4 12/3/4 1/2/34 Each subset has individual route 1/2/3/4 Set of training tuples {1,2,3,4}* has cardinality n = 4 * We denote {{1},{2,3}} as “1/23” for brevity 6

  7. Query Mesh Optimization Problem Query Mesh Cost Model (main idea) Cost(QM) = Cost of Classifier + Cost of routes + Multi-route overhead Query Mesh Search Algorithms Optimal Query Mesh Search (Opt-QM) Query Mesh Search Heuristics Start solution • Three components of search heuristics: • (1) Start Solution • 5 different approaches • - extreme-1, extreme-N, random, • content-driven, route-driven • Experimentally evaluated • (2) Search Strategy • Randomized algorithms • -Iterative Improvement • - Simulated annealing • (3) Stop condition • Largely depends on the search strategy • employed • -K-iterations, Plateau, Time-bounded, • Resource-bounded Main idea: • Form all possible sets for the • given powerset Final solution (2 ) Form partitions out of the above sets = explored solutions 7 Too expensive! Need heuristics!

  8. Query Mesh OptimizationOverview Query Executor Query Optimizer Sample of Tuples (training dataset) - QM Optimizer - QM Executor r1 … r2 r4 sample sample sample and so on Query Mesh Compute Routes (i.e., plans) … t10 t9 t8 t7 t6 t5 t4 t3 t2 t1 … t12 t11 r2 r1 Data Stream Induce Classifier r3 … … … r4 … [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, (Demo) In VLDB 2009. 8

  9. Query Mesh Execution Overview Query Optimizer Query Executor - QM Optimizer - QM Executor Data Stream Send to Self-Routing Fabric … … t12 t11 t10 t9 t8 t7 t6 t5 t4 t3 t2 t1 Classification Window (tumbling window) r-tokens data tuples route r1 t5 t4 t3 t1 <1,4,3,2> route r2 <2,4,3,1> t9 t6 t2 After Classification route r3 t10 t8 t7 <3,4,1,2> [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009. 9 rusters

  10. But… data characteristics may change… At time T At time T + 1 At time T + 2 At time T + 3 10

  11. Dynamic Re-optimization with Query Mesh Can we have an execution strategy that • is plan-based • supports different plans for distinct subsets of data • is as adaptive “as Eddies” Self-Tuning Query Mesh (ST-QM) 11 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

  12. Outline • Introduction & Motivation • Background : Query Mesh • Model • Optimization • Execution • Dynamic Re-Optimization with Query Mesh • Challenges • Architecture • Details • Conclusion • Current and Future Work 12

  13. Contributions Challenges • What should be monitored to determine whether the current QM solution is no longer adequate? • How to determine if the current QM solution should be adapted? • How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being executed? Classifier Multiple routes Data and Statistics Monitoring Query Mesh Self-Tuning Query Mesh … … Concept Drift Analysis, QM Cost Model, Improvement Measure … … … … Single Lightweight Operation to Physically Adapt QM [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009. 13

  14. ST-QM Architecture ST-QM • Static QM Framework Query Executor Query Optimizer Query Executor Query Optimizer Adaptive QM Framework [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009. 14

  15. ST-QM Components ST-QM measurements recommendations • ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted) • ST-QM Analyzerdetermines if a concept drift has actually occurred and makes recommendations if and how the QM solution should be adapted • ST-QM Actuatortakes these recommendations and physically adapts the QM solution ST-QM Actuator ST-QM Monitor ST-QM Analyzer actuation sampling Query Mesh New Query Mesh 15

  16. ST-QM Actuator: Physical Query Mesh Adaptation Self-Routing Fabric • All possible recommendations: • Case 1: Virtual Concept Drift Recommendation • Case 2: Real Concept Drift Recommendation • Case 3: Hybrid Concept Drift Recommendation Query results The beauty of the proposed design!!! 0 opi 1 opi 2 R3 New Classifier + New Routes R2 Old Classifier + New Routes R1 New Classifier + Old Routes New Classifier opk 3 Query Mesh Query Mesh Query Mesh rusters Data … … … 4 r1 opl … … … r2 Current Classifier … … … r3 … Op-modules r1 Online Classifier r2 Classifier Modification OI-array rusters r3 17

  17. Experimental Evaluation • ST-QM was implemented inside Java-based continuous query engine called CAPE • Compare its relative performance against competitor systems, namely, we compared adaptive QM against: • Static (non-adaptive) QM, • Adaptive “plan-less” Eddies • Adaptive “plan-less” Eddies with CBR-based routing policy • Results can be found in EDBT’ 2010. 18

  18. Summary of ST-QM Experimental Results ST-QM gave up to 44% improvement in execution time and output rate compared to non-adaptive QM, Eddy and single plan execution approach The runtime overhead of ST-QM relative to query execution is small (on average 2%). The actuation cost of physical adaptivityis nearly negligible resulting in 0.02% of total execution cost Even if no adaptivity is needed, ST-QM’s performance in the worst case will be at most 2-3% slower than static QM 19 19

  19. Conclusion • Query Meshis practical query optimization approach • Eliminates single plan assumption • Feasibility shown • Has low overhead & high potential benefit • Easily implemented and integrated with existing systems • Query Mesh leads to novel solutions • Usage of machine learning in query optimization and query processing • Usage of network-inspired techniques in query optimization and query processing 20

  20. Next Steps in QM Project • Consider state caching and indexing in QM stream context • Work with alternate classification methods for route decisions • Design customized query optimization and processing strategies • Study multi-query processing and optimization • Scale by applying distributed processing technologies • Do QM principles also apply in static DB context !? 21

  21. Thank you to current and past DSRG members for stream engine development, feedback, collaboration, and much more. Thank You for Listening !!!!! 22

More Related