Generalized hash teams for join and group by
Download
1 / 26

Generalized Hash Teams for Join and Group-By - PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on

Generalized Hash Teams for Join and Group-By. Alfons Kemper Donald Kossmann Christian Wiesner Universität Passau Germany. Outline. Motivating Example Standard Hash Teams Generalized Hash Teams for Joins Generalized Hash Teams for Joins/Grouping False Drops Analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Generalized Hash Teams for Join and Group-By' - sheryl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Generalized hash teams for join and group by

Generalized Hash Teams for Join and Group-By

Alfons Kemper

Donald Kossmann

Christian Wiesner

Universität Passau

Germany


Outline
Outline

  • Motivating Example

  • Standard Hash Teams

  • Generalized Hash Teams for Joins

  • Generalized Hash Teams for Joins/Grouping

  • False Drops Analysis

  • Application Examples (TPC-D)

  • Performance Evaluation

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Traditional join plan
Traditional Join Plan

R

T

S

R

A

Result

S

A

T

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams for join and group by

T

R

S

A

A

A

A

Traditional Hash Team Join Plan[Graefe, Bunker, Cooper: VLDB 98]

R

A

R.A

Result

S

A

S.A

T

T.A

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams

T

R

S

A

B

Generalized Hash Teams

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams1

T

T

R

R

S

S

A

A

B

B

Generalized Hash Teams

6 mod 5

=1

Partition

on B

odd: yellow

even: green

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash team for grouping aggregation
Generalized Hash Team for Grouping/Aggregation

  • select c.City, sum(o.Value)from Customer c, Order owhere c.C# = o.C#group by c.City

Join and

grouping

team

Agg

Agg

Ptn on City

Bit-

maps

(BM)

Ptn on BM

Ptn on C#

Ptn on C#

Ptn on City

Order

Customer

Customer

Order

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams for join and group by

City

C#

Group (Customer Order )

Partition on City

and generate

bitmaps for C#

City

Customer

C#

Partition with

bitmaps for C#

Order

C#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Generalized hash teams for join and group by

Group (Customer Order Lineitem)

City

C#

O#

Partition on City

and generate

bitmaps for C#

City

Customer

C#

Partition with

bitmaps for C#

and generate

bitmaps for O#

C#

Order

O#

O#

Partition with

bitmaps for O#

Lineitem

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


False drops

T

T

R

R

S

S

A

A

B

B

False Drops

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Overlapping partitions

(Customer Order Lineitem)

C#

O#

Overlapping Partitions

Partition

on B and

generate

bitmaps for A

T

Customer

Partition on

C# and generate

bitmaps for O#

S

Order

Partition

based on the

bitmaps for A

Lineitem

R

Partition with

Bitmaps

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Applicability of generalized hash teams
Applicability ofGeneralized Hash Teams

Partition

on B

  • for partitioning hierarchical structures A B

  • but it is also correct for non-strict hierarchies A B (but performance deteriorates)

Partition on

bitmaps

for A

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Non strict hierarchy a b

T

S

R

T

T

R

R

S

S

A

A

B

B

Non-strict hierarchyA B

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


False drops estimation
False Drops Estimation

b: cardinality of the bitmaps

n: number of partitions

probability that some s sets a bit leading to a false drop of an r into a particular partition:

total number of false drops:

conservative approximation:

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Implementation details fine tuning the partitioning
Implementation Details:Fine Tuning the Partitioning

Bloom-Filter

[Bratbergsengen]

[Valduriez]

Bitmaps

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Implementation details teaming up join and grouping

City

C#

Implementation Details:Teaming up Join and Grouping

Group (Customer Order )

Partition on City

and generate

bitmaps for C#

City

Customer

C#

C#

Order

Partition with

bitmaps for C#

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Teaming up join and grouping build phase
Teaming Up Join and Grouping: Build Phase

5

25

13

23

PA

M

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Teaming up join and grouping probe phase
Teaming Up Join and Grouping: Probe Phase

5

25

13

23

10

1

PA

M

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Performance comparison

City

C#

Performance Comparison:

Group (Customer Order )

Memory [MB]

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


False drops estimation and measurement
False Drops Estimation and Measurement

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Performance comparison1

Group (Customer Order Lineitem)

City

C#

O#

Performance Comparison:

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams

Memory [MB]


False drops estimation and measurement1
False Drops Estimation and Measurement

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Conclusion and future work
Conclusion and Future Work

  • Look-Ahead Partitioning for Joins and Grouping

  • Applicable for hierarchical data structures

  • correctness does not depend on strict hierarchies

  • Applicable for several TPC-D (TPC-H and TPC-R) queries: e.g., Q5, Q10, Q18

  • Combining Generalized Hash Teams and Order Preserving Hash Joins (OHJ)

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Tpc d q5
TPC-D Q5

SELECT

N_NAME, SUM(L_EXTENDEDPRICE * ( 1 - L_DISCOUNT)) AS REVENUE

FROM

CUSTOMER, ORDER, LINEITEM, SUPPLIER, NATION, REGION

WHERE

C_CUSTKEY = O_CUSTKEY

AND O_ORDERKEY = L_ORDERKEY

AND L_SUPPKEY = S_SUPPKEY

AND C_NATIONKEY = S_NATIONKEY

AND S_NATIONKEY = N_NATIONKEY

AND N_REGIONKEY = R_REGIONKEY

AND R_NAME = '[region]'

AND O_ORDERDATE >= DATE '[date]'

AND O_ORDERDATE < DATE '[date]' + INTERVAL 1 YEAR

GROUP BY

N_NAME

ORDER BY

REVENUE DESC;

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Tpc d q10
TPC-D Q10

SELECT

C_CUSTKEY, C_NAME, SUM(L_EXTENDEDPRICE *

(1 - L_DISCOUNT)) AS REVENUE,

C_ACCTBAL, N_NAME, C_ADDRESS, C_PHONE, C_COMMENT

FROM

CUSTOMER, ORDER, LINEITEM, NATION

WHERE

C_CUSTKEY = O_CUSTKEY

AND L_ORDERKEY = O_ORDERKEY

AND O_ORDERDATE >= DATE '[date]'

AND O_ORDERDATE < DATE '[date]' + INTERVAL 3 MONTH

AND L_RETURNFLAG = 'R'

AND C_NATIONKEY = N_NATIONKEY

GROUP BY

C_CUSTKEY, C_NAME, C_ACCTBAL, C_PHONE, N_NAME, C_ADDRESS,

C_COMMENT

ORDER BY

REVENUE DESC;

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams


Indirectly partitioning a hierarchical structure
Indirectly Partitioning a Hierarchical Structure

Customer

City

C#

C#

Order

O#

O#

Lineitem

Partition 1

Partition 2

Partition 3

A. Kemper, D. Kossmann, C. Wiesner: Generalized Hash Teams