Generalized hash teams for join and group by
Download
1 / 26

Generalized Hash Teams for Join and Group-By - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

Generalized Hash Teams for Join and Group-By. Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany. Outline. Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for Joins/Grouping False Drops Analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Generalized Hash Teams for Join and Group-By' - sheryl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Generalized hash teams for join and group by

Generalized Hash Teams for Join and Group-By

Alfons Kemper

Donald Kossmann

Christian Wiesner

Universität Passau

Germany


Outline
Outline

  • Motivating Example

  • Standard Hash Teams

  • Generalized Hash Teams for Joins

  • Generalized Hash Teams for Joins/Grouping

  • False Drops Analysis

  • Application Examples (TPC-D)

  • Performance Evaluation

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Traditional join plan
Traditional Join Plan

R

T

S

R

A

Result

S

A

T

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


T

R

S

A

A

A

A

Traditional Hash Team Join Plan[Graefe, Bunker, Cooper: VLDB 98]

R

A

R.A

Result

S

A

S.A

T

T.A

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams

T

R

S

A

B

Generalized Hash Teams

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams1

T

T

R

R

S

S

A

A

B

B

Generalized Hash Teams

6 mod 5

=1

Partition

on B

odd: yellow

even: green

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash team for grouping aggregation
Generalized Hash Team for Grouping/Aggregation

  • select c.City, sum(o.Value)from Customer c, Order owhere c.C# = o.C#group by c.City

Join and

grouping

team

Agg

Agg

Ptn on City

Bit-

maps

(BM)

Ptn on BM

Ptn on C#

Ptn on C#

Ptn on City

Order

Customer

Customer

Order

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


City

C#

Group (Customer Order )

Partition on City

and generate

bitmaps for C#

City

Customer

C#

Partition with

bitmaps for C#

Order

C#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Group (Customer Order Lineitem)

City

C#

O#

Partition on City

and generate

bitmaps for C#

City

Customer

C#

Partition with

bitmaps for C#

and generate

bitmaps for O#

C#

Order

O#

O#

Partition with

bitmaps for O#

Lineitem

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


False drops

T

T

R

R

S

S

A

A

B

B

False Drops

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Overlapping partitions

(Customer Order Lineitem)

C#

O#

Overlapping Partitions

Partition

on B and

generate

bitmaps for A

T

Customer

Partition on

C# and generate

bitmaps for O#

S

Order

Partition

based on the

bitmaps for A

Lineitem

R

Partition with

Bitmaps

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Applicability of generalized hash teams
Applicability ofGeneralized Hash Teams

Partition

on B

  • for partitioning hierarchical structures A B

  • but it is also correct for non-strict hierarchies A B (but performance deteriorates)

Partition on

bitmaps

for A

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Non strict hierarchy a b

T

S

R

T

T

R

R

S

S

A

A

B

B

Non-strict hierarchyA B

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


False drops estimation
False Drops Estimation

b: cardinality of the bitmaps

n: number of partitions

probability that some s sets a bit leading to a false drop of an r into a particular partition:

total number of false drops:

conservative approximation:

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Implementation details fine tuning the partitioning
Implementation Details:Fine Tuning the Partitioning

Bloom-Filter

[Bratbergsengen]

[Valduriez]

Bitmaps

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Implementation details teaming up join and grouping

City

C#

Implementation Details:Teaming up Join and Grouping

Group (Customer Order )

Partition on City

and generate

bitmaps for C#

City

Customer

C#

C#

Order

Partition with

bitmaps for C#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Teaming up join and grouping build phase
Teaming Up Join and Grouping: Build Phase

5

25

13

23

PA

M

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Teaming up join and grouping probe phase
Teaming Up Join and Grouping: Probe Phase

5

25

13

23

10

1

PA

M

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Performance comparison

City

C#

Performance Comparison:

Group (Customer Order )

Memory [MB]

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


False drops estimation and measurement
False Drops Estimation and Measurement

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Performance comparison1

Group (Customer Order Lineitem)

City

C#

O#

Performance Comparison:

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

Memory [MB]


False drops estimation and measurement1
False Drops Estimation and Measurement

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Conclusion and future work
Conclusion and Future Work

  • Look-Ahead Partitioning for Joins and Grouping

  • Applicable for hierarchical data structures

  • correctness does not depend on strict hierarchies

  • Applicable for several TPC-D (TPC-H and TPC-R) queries: e.g., Q5, Q10, Q18

  • Combining Generalized Hash Teams and Order Preserving Hash Joins (OHJ)

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Tpc d q5
TPC-D Q5

SELECT

N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE

FROM

CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION

WHERE

C_CUSTKEY = O_CUSTKEY

AND O_ORDERKEY = L_ORDERKEY

AND L_SUPPKEY = S_SUPPKEY

AND C_NATIONKEY = S_NATIONKEY

AND S_NATIONKEY = N_NATIONKEY

AND N_REGIONKEY = R_REGIONKEY

AND R_NAME = '[region]'

AND O_ORDERDATE >= DATE '[date]'

AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR

GROUP BY

N_NAME

ORDER BY

REVENUE DESC;

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Tpc d q10
TPC-D Q10

SELECT

C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE *

(1 - L_DISCOUNT)) AS REVENUE,

C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT

FROM

CUSTOMER, ORDER, LINEITEM, NATION

WHERE

C_CUSTKEY = O_CUSTKEY

AND L_ORDERKEY = O_ORDERKEY

AND O_ORDERDATE >= DATE '[date]'

AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH

AND L_RETURNFLAG = 'R'

AND C_NATIONKEY = N_NATIONKEY

GROUP BY

C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS,

C_COMMENT

ORDER BY

REVENUE DESC;

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Indirectly partitioning a hierarchical structure
Indirectly Partitioning a Hierarchical Structure

Customer

City

C#

C#

Order

O#

O#

Lineitem

Partition 1

Partition 2

Partition 3

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


ad