1 / 25

Constraint Processing Techniques for Improving Join Computation: A Proof of Concept

Constraint Processing Techniques for Improving Join Computation: A Proof of Concept. Anagh Lal & Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln. An illustrative example. Join query SELECT R1.A,R1.B,R1.C

alea-durham
Download Presentation

Constraint Processing Techniques for Improving Join Computation: A Proof of Concept

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constraint Processing Techniques for Improving Join Computation: A Proof of Concept Anagh Lal & Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln

  2. An illustrative example • Join query SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C • 10 tuples in 3 nested tuples

  3. Advantages • Direct • Savings of number of tuple comparisons • Savings in I/O for next operator • Space reduction of materialized join queries • Future applications • Use for query size estimation • Assist in high-level analysis of data & in data mining

  4. Our contributions • A new representation of a join query as a Constraint Satisfaction Problem (CSP) • A new sorting-based bundling algorithm • Suitable for CSPs with fewer and larger constraints (i.e., join) • Improves memory usage • A new sort-merge join algorithm for producing (dynamically) bundled tuples • Yields compact representation, saves memory space • Identification of possible applications • Data analysis • Materialized views • Assisting query-size estimation  Suggested, not yet demonstrated

  5. V1 V2 {c, d, e, f} {d} V4 V3 {a, b, d} {a, b, c} Constraint Satisfaction Problem • Given P= (V, D, C) • V = {Vi}, a set of variables • D = {DVi}, the set of their respective domains • C is a set of constraints restricting the acceptable combination of values for variables. • Solution is a consistent assignment of values to variables • Query: find 1 solution, all solutions, etc.

  6. V1 V2 {c, d, e, f} {d} S S S V1 V1 V1 d d d V4 V3 V2 V2 V2 {a, b, d} {a, b, c} c e f d c e, f d c d, e, f Solving CSPs • Typically, DFS & backtracking • Improvement • Static bundling [Freuder 91] • Dynamic bundling [our group] • Based on dynamically identifying symmetries • Guaranteed never less efficient than non-bundling, static bundling Dynamic bundling Without bundling Static bundling

  7. Modeling Join as a CSP • Attributes of relations  CSP variables • Attribute values  variable domains • Relations  relational constraints • Join conditions  join-condition constraints • SELECT R1.A,R1.B,R1.C • FROM R1,R2 • WHERE R1.A=R2.A • AND R1.B=R2.B • AND R1.C=R2.C

  8. Sorting-based bundling R1.A • Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible R2.A R1 R1.B R2.B R2 R1.C R2.C • Sort relations using above ordering • Next: Compute bundles of variable ahead in variable ordering (R1.A)

  9. Bundling an attribute • Partition of a constraint Tuples of the relation having the same value of R1.A • Compare projected tuples of first partition with those of another partition • Compare with every other partition to get complete bundle Partition Unequal partitions Symmetric partitions Bundle {1, 5}

  10. Join using dynamic bundling Select next- variable Compute next valid bundle Start Move to previous variable Found bundle? No Output one tuple Undo previous assignment No Yes 1st in Ordering? Yes Assign bundle Last variable? Yes No Stop

  11. Finding the valid bundle Common {1, 5} • Compute a bundle for the attribute • Check bundle validity with future constraints • If no common value found GOTO 1  Assign variable with the surviving values in the bundle {1, 5, x} {1, 5, y, z}

  12. Analysis of overheads • For Bundling • Additional data structures: 2 arrays, 1 pointer • Only 1 array may become cumbersome • Array size is largest • when all the values of a variable are in one bundle • But, this case also leads to best savings! • Improved implementation • Use of Bitmaps?

  13. ProgressiveMerge Join • PMJ: A sort-merge algorithm by [Dittrich et al. 03] • Provides early results • Assists in query size-estimation • Two main phases • Sorting: starts producing results in this phase • Merging phase: merges sorted runs • We use the framework of the PMJ for our external join. • Implemented & evaluated with the XXL library • We use the same library for our implementation

  14. Preliminary experiments • Data sets • Random: 2 relations R1, R2 with same schema as example • Each relation: 10’000 tuples • Memory size: 4’000 tuples • Page size 200 tuples • Real-world problem: 3 relations, 4 attributes • Compaction rate achieved • Random problem: 1.48 • Savings compensate for even worst case (of the current experimental implementation) • Real-world problem: 2.26 (69 tuples in 32 nested tuples)

  15. Related work • Join algorithms • Well established algorithms • Do not focus on exploiting symmetry • Database compression • Output results are not compressed • Compression at value level, not tuple level

  16. Related work (contd) • [Mamoulis & Papadias 1998] • Join using FC for spatial DB • Restricted to binary constraints • No compaction of solution space • [Bayardo et al. 1996] • Reduce the number of the intermediate tuples of a sequence of joins • [Rich et al. 1993] • Do not compact join attribute values • Does not detect redundancy present in the grouped sub-relations

  17. Future work • Refine implementation • Use of lighter data structures • Test usefulness in the context of Constraint DBs • Values are continuous intervals, e.g. spatial database • Conduct thorough evaluations of overall performance & overhead (memory & CPU) on different data distributions • Investigate benefit of using bundling • query size estimation • materialized views

  18. Research supported by CAREER Award #0133568 from NSF

  19. DB vs. CSP terminology

  20. Bundling relations: Data structures • Considering the portion of the relation in memory • Current-Inst: To store the current instantiations of past variables Vpof R1. • Current-Constraint: selection of R’: • Past variable values equal Current-Inst • Current variable Vc > all previous instantiations of Vc

  21. Bundling relations:Computing bundles (Algorithm 1) • NEXT-PARTITION(p) returns the first unchecked partition in Current-Constraint following the partition p. • Sorted constraints Checking equality of tuples is efficient

  22. Bundling relations: Data structures • Processed-Values: Cumulatively stores non-representative values of bundles • Computing bundles of Vc Values of Vc in it are ignored • Partition pis marked as checked when: • Value(p) is in an instantiation bundle • p is selected for comparing with other partitions to check for bundles

  23. Join computation: In memory • Two subsets of relations (some pages) in memory: • Algorithm to find result of joining the two. • Join computed as a search • Finding all solutions • After finding one solution, search resumes from same depth • Algorithm shown can be entered at any “depth” in the search • Uses Algorithm 1 to find bundles for assigning to variables

  24. Expanded on next slide Join computation: In memory • Join as a search (Algo. 2) • BACKTRACK • Variable[depth] in Current-Inst reset • Processed-Values for the variable emptied • Value in Current-Solution reset • Current-Constraint re-computed • Undoes the effects of the previous instantiation.

  25. Join computation: In memory • COMMON(bi, bundles) subset of bi consistent using join-condition constraints • For equality COMMON Intersection • Empty result of COMMON inconsistency BACKTRACK

More Related