Experimental Evidence on Partitioning in Parallel Data Warehouses

Experimental Evidence on Partitioning in Parallel Data Warehouses Pedro Furtado Prof. at Univ. of Coimbra & Researcher at CISUC DEI/CISUC-Universidade de Coimbra Portugal

Context • Parallelism used for major performance improvement in large Data warehouses • Using simple low-cost shared-nothing architecture • Without any efficiency requirements on Network or Nodes NODE PARTITIONED DATA WAREHOUSE • Minimize inter-node data exchange requirements • Horizontally fully-partition facts (largest), rest of relations are replicated • Hope to obtain near-to-linear speedup

D3 D2 D2 D1 D3 D4 D1 D3 D4 D2 D1 D3 D4 D2 D1 D4 Sales Sales Sales Sales Node 1 Node 2 Node 3 to run it n times faster …“Divide to conquer” - Horizontally Partition Large Facts (randomly) into n Nodes - Replicate other Relations (Small Dimensions?)

Why Replicate Dimensions? • We replicated because we would not need to repartition …and you can do other ops independently as well Wouldn´t work with partitioned dimensions:

Query processing SUM(X) over 1/n FACT, Ds GROUP BY dims SUM(X) over 1/n FACT , Ds GROUP BY dims SUM(X) over FACT, dims GROUP BY dims SUM(SUMs) SUM(X) over 1/n FACT, Ds GROUP BY dims

Query Processing Steps Submitter Node Apply Merge Query Repartition Rewrite Query Send Query 2. 1. 6. 4. Redistribute Send Partial Results Compute Partial Result Computing Nodes 5. 7. 3.

Problem (TPC-H case study) Supplier Part Supp Orders Lineitem • Many typical Schemas are “Complex” – many large relations may exist Part Customer ? ? Very large Large Medium

Problem Statement • Divide by N … would expect N times faster - Linear Speedup (LS) • However, we don´t get the LS

Our Major Contributions • Show these problems experimentally • performance evaluation benchmark TPC-H: We EXPLAIN AND ILLUSTRATE the LARGE RELATIONS problem • Identify simple modifications to improve results • Analyze the modifications experimentally

Partitioning Facts (Largest) LI + PS Partitioned S PS Li O Li Li Node 1 P C O PS S P C Node N P C O PS S

Generated TPC-H 50GB into 1 and 25 nodes • Used PCs (Pentium III 866 MHz CPU) 512MB RAM • Oracle 9i, tuned initial setting • TPC-H 22 query set • Measured Response Time: 1 node against 25 nodes • We show that the speedup underachievement is explained mostly by the size of replicated dimensions

Experimental Results • Only a few queries exhibited near-to-LS! LS Speedup: 25-30 Medium Speedup 6-15 Very Low Speedup 0.4-1.9 Low Speedup 2-6

Some had Linear Speedup… Li Li LS Speedup 25-30 Q15: Q1, Q6: P P C C O O PS PS S S • Access only fragments (Li/N) • S is reasonably small relative to Li/N

Others had smaller speedup… Li Li Medium Speedup 6-15 Q14, Q19: Q11 P C P C O PS O PS S S • P is not small relative to fragment (Li/N) • S is not small relative to PS/N

What Happened… • With N nodes we would like to: • process 1/N of the data, have about N times speedup • However, we have replicated relations… • The amount of speedup degradation depends on the size of R2 relative to R1/N

Low Speedup Queries: Li Li Li Q16: Speedup 2-5.5 P C O PS S • P is large relative to PS/N Q9: Q3, Q5, Q7, Q10, Q12: P P C C O O PS PS S S • O is large relative to Li/N

Very Low or No Speedup Queries: Li Speedup 0.4-2 Q13, Q22: P C Li O PS S • Process only replicated relations Q8: Q4, Q21, Q2: P C O PS • Scenarios Similar to “Slow Queries” S • Includes all replicated relations

What Happened… • Not only includes replicated relations… • But also replicated relations included are very large in comparison to fragments!

The same in pictures… Medium speedup Low speedup Li Li Li Li • Large speedup P P C C O O PS PS S S • No speedup at all P P C C O O PS PS S S • O is large relative to Li/N • O is large relative to Li/N

Back to Partitioning Alternatives… Product Supply History (PS) Orders (O) Lineitem (LI) Customer (C) • Placement alternatives: relation in Single Node vs Replicated (all nodes) vs Partitioned • Partitioning function (Round-robin/Random, Range, HASH) • Choice of Partitioning attributes ? ? ? C key O key PS_key • Repartitioning = re-hash by exchanging rows between nodes • When you partition more than 1 rel => will probably need to repartition • e.g.: If you partition LI and O by O_KEY = “equi-partitioned” … LI join PS needs repartitioning of LI … O join C needs repartitioning of O

Lets Review Related Work… • Replicate all but one relation – PRS[Yu et al., TKDE89] • Similar to what we did: replicated all except LI [Yu et al., TKDE89]: “Partition strategy for distributed query processing in fast local networks” • Partition using dependencies -PLACEMENT DEPENDENCY[Liu et al, ICDE96] • e.g. partition ORDERs and Co-locate its LINEITEM rows (LI is the dependant relation) [Liu et al, ICDE96]: “A Distributed Query Processing Strategy Using Placement Dependency”[Chen et al, ICPADS 2000]: “An Efficient Algorithm for Distributed Queries Using Partition Dependency”. • Parallel Hash Join and Optimization - PHJ • Relations are hash-partitioned, Repartitioning required to re-hash in order to JOIN [DeWitt et al., VLDB11]: “Multiprocessor Hash-Based Join Algorithms” [Liu et al, EDBT96]: “A Hash Partition Strategy for Distributed Query Processing” [Kitsuregawa et al., 1983 ], “Application of hash to database machine and its architecture” [Shasha et al., TODS91]: “Optimizing Equijoin Queries In Distributed Databases … Hash Partitioned”. • Workload-based Partitioning and Placement • Determine best partitioning attributes automatically, based on the workload • [Daniel Zilio et al. 1994], “Partitioning Key Selection for a Shared-Nothing Parallel Database System” • [Rao et al., SIGMOD 2000]: Automating physical database design in a parallel database.

Local Replicated Join: • Join Fragment to replicated relation locally, no data exchanged • One Relation must be Replicated • E.g. LI(O_KEY), O() Costlocal replicated join= N nodes, relations R,  constant

Local Partitioned Join • Join fragments locally, no data exchanged • Relations must be equi-partitioned • E.g. LI(O_KEY), O(O_KEY) Costlocal join= N nodes, relations R,  constant

Repartition Join • Re-hash with data exchange, then join locally • Relation Partitions are not co-located • E.g. O(O_KEY), C(C_KEY) Depends on network configuration CostRepartition join= ,  constant weight factors

Proposed Solution • “Very Small” Dimensions • Replicate • “Very small” depends on relation sizes and nº of nodes • Non-small Dimensions • Hash-Partition by PRIMARY KEY • because they “always” join based on PK (with facts) • like in placement-dependency, we take advantage of invariant • Facts • Find hash-partitioning attribute that minimizes repartitioning costs • Reasonable approximation: most frequent equi-join attr.

Result of Partitioning (TPC-H) Li P_KEY O_KEY P O_KEY P_KEY C O PS S Local Join (equi-partitioned) Replicated Join (with small dimension) Repartitioned Join

Experimental Results Ship only selected rows from LI … LI join P LI join P

Repartition VS Total Runtime • TC = total runtime • RC = repartition time • Repartition time is reasonably small… • Depends on: number of nodes + selectivities • (can be very dependent on selection conditions of specific query)

Conclusions • We have analyzed a basic partitioning strategy (PRS-like) • Largest Relation is partitioned, the others are replicated • The speedup is totally unsatisfactory for many queries • We analyzed why this happens: explained by access patterns to replicated relations • We tried very simple partitioning alternative • Only very small relations are replicated • Dimensions are partitioned by Primary Key • Hash-partition facts, partitioning key = most frequent join attr • We have shown that it works well • prevents very low speedup • provides near to linear speedup for most queries

Thank You! • Questions? • www.eden.dei.uc.pt/~pnf • pnf@dei.uc.pt

Experimental Evidence on Partitioning in Parallel Data Warehouses