1 / 9

6.830 Lecture 10

6.830 Lecture 10 . Query Optimization Recap Column Stores Group meeting signup: http:// bit.ly /ZuyF5I. Selinger Optimizer Algorithm. algorithm : compute optimal way to generate every sub- join: size 1, size 2, ... n (in that order ) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC}

dolan
Download Presentation

6.830 Lecture 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6.830 Lecture 10 Query Optimization Recap Column Stores Group meeting signup: http://bit.ly/ZuyF5I

  2. Selinger Optimizer Algorithm • algorithm: compute optimal way to generate every sub-join: size 1, size 2, ... n (in that order) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC} R set of relations to join For iin {1...|R|}: for S in {all length isubsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Precomputed in previous iteration!

  3. More Explicitly R set of relations to join For iin {1...|R|}: for S in {all length isubsets of R}: optcosts = ∞ optjoinS = ø for a in S: //a is a relation csa = cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a if csa < optcosts optcosts= csa optjoins = optjoin(S-a) joined optimally w/ a This is the same algorithm as on the previously slide, written differently Precomputed in previous iteration!

  4. Example 4 Relations: ABCD (only consider NL join) Optjoin: A = best way to access A (e.g., sequential scan, or predicate pushdown into index...) B = " " " " B C = " " " " C D = " " " " D {A,B} = AB or BA {A,C} = AC or CA {B,C} = BC or CB {A,D} {B,D} {C,D} Optjoin R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a

  5. Example (con’t) Optjoin {A,B,C} = remove A: compare A({B,C}) to ({B,C})A remove B: compare ({A,C})B to B({A,C}) remove C: compare C({A,B}) to ({A,B})C {A,C,D} = … {A,B,D} = … {B,C,D} = … … {A,B,C,D} = remove A: compare A({B,C,D}) to ({B,C,D})A remove B: compare B({A,C,D}) to ({A,C,D})B remove C: compare C({A,B,D}) to ({A,B,D})C remove D: compare D({A,C,C}) to ({A,B,C})D Optjoin R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a

  6. Complexity • Number of subsets of set of size n = |power set of n| = 2n (here, n is number of relations) • How much work per subset? Have to iterate through each element of each subset, so this at most n n2n complexity (vs n!) n=12  49K vs 479M Optjoin R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a

  7. C-Store Performance • How much do these optimizations matter? • Wanted to compare against best you could do with a commercial system

  8. Emulating a Column Store • Two approaches: • Vertical partitioning: for n column table, store n two-column tables, with ith table containing a tuple-id, and attribute i • Sort on tuple-id • Merge joins for query results • Index-only plans • Create a secondary index on each column • Never follow pointers to base table FAT SLOW

  9. Bottom Line • SSBM (Star Schema Benchmark -- O’Neil et al ICDE 08) • Data warehousing benchmark based on TPC-H • Scale 100 (60 M row table) • Average across 12 queries • Row store is a commercial DB, tuned by professional DBA vs C-Store Commercial System Does Not Benefit From Vertical Partitioning Time (s)

More Related