1 / 26

Generalized Hash Teams for Join and Group-By

Generalized Hash Teams for Join and Group-By. Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany. Outline. Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for Joins/Grouping False Drops Analysis

sheryl
Download Presentation

Generalized Hash Teams for Join and Group-By

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalized Hash Teams for Join and Group-By Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany

  2. Outline • Motivating Example • Standard Hash Teams • Generalized Hash Teams for Joins • Generalized Hash Teams for Joins/Grouping • False Drops Analysis • Application Examples (TPC-D) • Performance Evaluation A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  3. Traditional Join Plan R T S R A Result S A T A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  4. T R S A A A A Traditional Hash Team Join Plan[Graefe, Bunker, Cooper: VLDB 98] R A R.A Result S A S.A T T.A A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  5. T R S A B Generalized Hash Teams A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  6. T T R R S S A A B B Generalized Hash Teams 6 mod 5 =1 Partition on B odd: yellow even: green A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  7. Generalized Hash Team for Grouping/Aggregation • select c.City, sum(o.Value)from Customer c, Order owhere c.C# = o.C#group by c.City Join and grouping team Agg Agg Ptn on City Bit- maps (BM) Ptn on BM Ptn on C# Ptn on C# Ptn on City Order Customer Customer Order A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  8. City C# Group (Customer Order ) Partition on City and generate bitmaps for C# City Customer C# Partition with bitmaps for C# Order C# A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  9. Group (Customer Order Lineitem) City C# O# Partition on City and generate bitmaps for C# City Customer C# Partition with bitmaps for C# and generate bitmaps for O# C# Order O# O# Partition with bitmaps for O# Lineitem A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  10. T T R R S S A A B B False Drops A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  11. (Customer Order Lineitem) C# O# Overlapping Partitions Partition on B and generate bitmaps for A T Customer Partition on C# and generate bitmaps for O# S Order Partition based on the bitmaps for A Lineitem R Partition with Bitmaps A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  12. Applicability ofGeneralized Hash Teams Partition on B • for partitioning hierarchical structures A B • but it is also correct for non-strict hierarchies A B (but performance deteriorates) Partition on bitmaps for A A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  13. T S R T T R R S S A A B B Non-strict hierarchyA B A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  14. False Drops Estimation b: cardinality of the bitmaps n: number of partitions probability that some s sets a bit leading to a false drop of an r into a particular partition: total number of false drops: conservative approximation: A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  15. Implementation Details:Fine Tuning the Partitioning Bloom-Filter [Bratbergsengen] [Valduriez] Bitmaps A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  16. City C# Implementation Details:Teaming up Join and Grouping Group (Customer Order ) Partition on City and generate bitmaps for C# City Customer C# C# Order Partition with bitmaps for C# A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  17. Teaming Up Join and Grouping: Build Phase 5 25 13 23 PA M A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  18. Teaming Up Join and Grouping: Probe Phase 5 25 13 23 10 1 PA M A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  19. City C# Performance Comparison: Group (Customer Order ) Memory [MB] A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  20. False Drops Estimation and Measurement A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  21. Group (Customer Order Lineitem) City C# O# Performance Comparison: A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams Memory [MB]

  22. False Drops Estimation and Measurement A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  23. Conclusion and Future Work • Look-Ahead Partitioning for Joins and Grouping • Applicable for hierarchical data structures • correctness does not depend on strict hierarchies • Applicable for several TPC-D (TPC-H and TPC-R) queries: e.g., Q5, Q10, Q18 • Combining Generalized Hash Teams and Order Preserving Hash Joins (OHJ) A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  24. TPC-D Q5 SELECT N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE FROM CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION WHERE C_CUSTKEY = O_CUSTKEY AND O_ORDERKEY = L_ORDERKEY AND L_SUPPKEY = S_SUPPKEY AND C_NATIONKEY = S_NATIONKEY AND S_NATIONKEY = N_NATIONKEY AND N_REGIONKEY = R_REGIONKEY AND R_NAME = '[region]' AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR GROUP BY N_NAME ORDER BY REVENUE DESC; A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  25. TPC-D Q10 SELECT C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE, C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT FROM CUSTOMER, ORDER, LINEITEM, NATION WHERE C_CUSTKEY = O_CUSTKEY AND L_ORDERKEY = O_ORDERKEY AND O_ORDERDATE >= DATE '[date]' AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH AND L_RETURNFLAG = 'R' AND C_NATIONKEY = N_NATIONKEY GROUP BY C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS, C_COMMENT ORDER BY REVENUE DESC; A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

  26. Indirectly Partitioning a Hierarchical Structure Customer City C# C# Order O# O# Lineitem Partition 1 Partition 2 Partition 3 A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

More Related