1 / 36

Distributed Database Management Systems

Distributed Database Management Systems. Lecture 33. In the previous lecture. Final phase of QD Data Localization: for HF, VF and DF. In today’s Lecture. Data Localization for Hybrid Fragmentation Query Optimization. Reduction for HyF. HyF contains both types of Fragmentations

raya-walls
Download Presentation

Distributed Database Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Database Management Systems Lecture 33

  2. In the previous lecture • Final phase of QD • Data Localization: for HF, VF and DF.

  3. In today’s Lecture • Data Localization for Hybrid Fragmentation • Query Optimization.

  4. Reduction for HyF • HyF contains both types of Fragmentations • EMP1=eNo ≤ E4 (eNo, eName (EMP)) • EMP2=eNo > E4 (eNo, eName (EMP)) • EMP3=eNo, title (EMP).

  5. eName eName eNo = E5 eNo = E5 ⋈eNo U EMP2 Reduced Query- EMP3 EMP1 EMP2 • Select eName from EMP where eNo = E5

  6. Summary of what we have done so far

  7. Query Decomposition: generates an efficient query in relational algebra • Normalization, Analysis, Simplification, Rewriting • Data Localization: applies global query to fragments; increases optimization level-

  8. So, next is the cost-based optimization • Mainly concentrates on the order of performing joins • Characteristics of relations like cardinalities are considered

  9. First QO in general • QO refers to producing a Query Execution plan (QEP) that represents execution strategy.

  10. Components of Optimizer • Search Space: set of eq. alternative exec plans • Cost Model: predicts cost of a execution plan • Search Strategy: produces best plan

  11. Search Space • Search space consists of eq. Query Trees produced using Tr Rules • Optimizer concentrates on join trees, since join cost is the most effective

  12. Example: • Select eName, resp From EMP, ASG, PROJ where EMP.eNo = ASG. eNo and ASG.pNo = PROJ.pNo.

  13. ⋈pNo ⋈eNo ⋈eNo ⋈pNo PROJ EMP EMP ASG PROJ ASG ⋈pNo, eNo x ASG EMP PROJ

  14. Alternatives with N relations are O(N!) based on properties of relations • So, restrictions are applied

  15. 1- Heuristics - Selection and projection on base relations - Avoid Cartesian product

  16. 2- Shape of Tree • Linear Tree: At least one node for each operand is a base relation • Bushy tree: May have operators with interm tables only; allows parallel execution

  17. Search Strategy • Most popular is Dynamic Programming • That starts with base relations and keeps on adding relations calculating cost

  18. DP is almost exhaustive so produces best plan • Too expensive with more than 5 relations • Other option is Randomized strategy • Do not guarantee best

  19. Cost Model • Cost of operators, statistics of base data to predict size of intermediate tables • Cost considered as Total Time and Response Time.

  20. Total time = CPU time + I/O time + tr time • In WAN, major cost is tr time • Initially ratios were 20:1 for tr and I/O, for LAN it is 1:1.6

  21. Response time = CPU time + I/O time + tr time • Difference.?

  22. TCPU = time for a CPU inst • TI/O = a disk I/O • TMSG = fixed time for initiating and recv a msg • TTR = transmit a data unit from one site to another.

  23. X units Site 1 Site 3 Site 2 Y units • TT = 2TMSG + TTR*(x+y) • RT = max{TMSG + TTR*X, TMSG + TTR*Y}

  24. Database Statistics • Major factor is interm tabs • If the interm results are to be transmitted, then estimation about size is a must • More precise statistics cost more

  25. For each relation R[A1, A2, …, An] fragmented as R1, …, Rr • length of each attribute: length(Ai) • the number of distinct values for each attribute in each fragment: card(Ai(Rj)) • maximum and minimum values in the domain of each attribute: min(Ai), max(Ai).

  26. 4.The cardinalities of each domain: card(dom[Ai]) and the cardinalities of each fragment: card(Rj) 5.Join selectivity factor for some of the relationsSFJ (R,S) = card(R ⋈ S)/ (card(R) ∗ card(S))-

  27. Cardinalities of Intermediate Results

  28. Selection Operation • Card(F(R))=SFS(F) * card(R) • SFS(A = value) = 1/card(A(R)) • SFS(A > value) = max(A) – value /(max(A) – min(A)) • SFS(A < value) = value - min(A) /(max(A) – min(A))

  29. SFS(A < value) = max(A) – value /(max(A) – min(A)) • SFS(p(Ai) ^ p(Aj)) = SFS(p(Ai)) * (SFSp(Aj)) • SFS(p(Ai) v p(Aj)) = SFS(p(Ai)) + SFS(p(Aj))–(SFS(p(Ai))* SFS(p(Ai))).

  30. Cardinality of Projection • Hard to determine precisely • Two cases when it is trivial 1- When a single attribute A, card(A(R)) = card (A) 2- When PK is included card(A(R)) = card (R)

  31. Cartesian Product • card(RxS) = card (R) * card(S). • Cardinality of Join • No general way to test without additional information • In case of PK/FK combination Card(R ⋈ S) = card (S)

  32. Semi Join: SFSJ(R ⋉AS)= card(A(S))/ card(dom[A]) card(R ⋉AS) = SFSJ(S.A) * card(R).

  33. Union: Hard to estimate • Limits possible which are card(R) + card(S) and max{card (R) + card (S)) • Difference: Like Union, card (R) for (R-S), and 0

  34. Centralized Query Optimization

  35. Why to Study • Distributed Query is transformed into local ones • Issues are related and more complex in DD • Easier to understand • Two famous ones.

  36. INGRES • Dynamic • Recursively breaks into smaller ones • System R • static • exhaustive search

More Related