1 / 29

Unit-testing Query Transformation Rules

Unit-testing Query Transformation Rules. Mostafa Elhemali Leo Giakoumakis. Outline. Problem definition QRel system overview Case Study Conclusion. Problem definition – QP background. Query processing in SQL Server:. ?. Profit. π T.a,R.z. π T.a,R.z. σ T.c <5. Merge ⋈ T.b = R.y.

hamlet
Download Presentation

Unit-testing Query Transformation Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unit-testing Query Transformation Rules Mostafa Elhemali Leo Giakoumakis

  2. Outline • Problem definition • QRel system overview • Case Study • Conclusion

  3. Problem definition– QP background • Query processing in SQL Server: ? Profit πT.a,R.z πT.a,R.z σT.c<5 Merge ⋈T.b=R.y Parse Optimize SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5 ⋈T.b=R.y σT.c<5 Sort on y T R Scan Index T.Ib Scan Table R

  4. Problem definition – QP background Transformation rules πT.a,R.z πT.a,R.z σT.c<5 σT.c<5 Merge ⋈T.b=R.y ⋈ ⋈ Merge ⋈ ⋈ ⋈T.b=R.y σT.c<5 σT.c<5 Sort on y ⋈ Hash ⋈ T R Scan Index T.Ib Scan Table R

  5. Problem definition– Input mismatch What we want to test System Output System Input πT.a,R.z σT.c<5 Merge ⋈T.b=R.y SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5 ⋈ ⋈ σT.c<5 σT.c<5 Sort on y Scan Index T.Ib Scan Table R

  6. Problem definition - Example • Rule to test: Push Aggregates Below Join ⋈ G G ⋈

  7. Problem definition - Example • Manual Approach – Construct equivalent SQL • Manually construct equivalent SQL constructs • Think of which tables to use • Think of exact join predicate, grouping columns, aggregates, etc. ⋈ G G ⋈ SELECT T.a,SUM(T.c) FROM T JOIN R on T.a= R.y GROUP BY T.a

  8. Problem definition - Example • Manual Approach – Variation on basic case • Project between Aggregation and Join SELECT M.Tb,SUM(M.p) FROM ( SELECT T.b Tb, (T.a + T.c) p FROM T JOIN R on T.b = R.y) M GROUP BY M.Tb ⋈ G π π G ⋈

  9. Problem definition– Challenges • Manual construction of SQL test cases is hard • Manual variation of these cases is harder • SQL test cases are over-specified • Harder to maintain Is the goal to test Aggregation over asum? SELECT M.Tb,SUM(M.p) FROM ( SELECT T.b Tb, (T.a + T.c) p FROM T JOIN R on T.b = R.y) M GROUP BY M.Tb Is the choice of T & R significant?

  10. Goal Allow testers to write test cases in abstract relational trees: σT.c<5 • Yet present SQL Server with concrete SQL queries: ⋈ SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5

  11. QRel overview RelOps OpFilter Expression OpJoin Predicate … SqlGen RelGen σ T.x=2 σ T.x=2 SELECT … FROM T JOIN T2 ON T.a = T2.b WHERE T.x = 2 σ T.x=2 ⋈T.a=T2.b ⋈T.a=T2.b T T T2 T2 T

  12. RelOps– Goal • Allow testers to write their relational tree test cases OpFilter σT.c<5 PredCompare(LT) OpInnerJoin Represented as ⋈ ExpColumn(T.c) ExpScalar(5)

  13. RelOps • .NET classes for all relational and scalar operators expressible in SQL • Relational: Join, Selection, Sort, … • Scalar: Predicates, Expressions • Does not represent any operator not expressible in SQL, e.g. semi-join • Metadata extraction • Basic property derivation • Output columns for relational operators • Data type for scalar expressions

  14. RelGen– Goal • Allow testers to write only the essential parts of the relational tree, and automatically fill out the rest πT.a,R.z σT.c<5 σT.c<5 ⋈ ⋈ T.b=R.y T R

  15. RelGen– Overall algorithm • Top-down generation of relational tree skeleton, • Followed by bottom-up filling out phase

  16. RelGen– Example Predicate Generators Possible operators CompareTwoColumns T.a,R.z → π π CompareColumnAndScalar InSubquery σ σ σ ⋈ ⋈ T.c<5 G … T.b=R.y … a,b,c x,y Available relations R R T T

  17. RelGen– Targeted generation • Tree generation is highly customizable for targeted testing • Can start from a partially filled out tree • Customizable probability distributions for any random parts, e.g. choice of relational operators , choice of predicates • Use of constraints to influence the tree generation

  18. RelGen– Considerations • Trees should be free of logical errors • E.g. No Aggregation over XML columns • Trees (and subtrees) should not be trivially optimized away • Avoid contradicting predicates • Operators should yield (many) rows if executed • Reasoning: makes them more expensive, lures optimizer into deeper optimizations. Also we use QRel to test all of QP at times • Use real data in predicates, e.g. Country = ‘England’ • Use PK-FK columns in joins

  19. SqlGen– Goal • Presents the server with proper SQL queries from the relational tree test cases πT.a,R.z SELECT T.a,R.z FROM T JOIN R on T.b = R.y WHERE T.c < 5 σT.c<5 ⋈T.b=R.y T R

  20. SqlGen– Challenges • Subquery and derived table formation • SQL clause formation • Table and column aliasing (inverse of binding)

  21. SqlGen– Example π SUM(T.a),R.z ⋈ T.b=R.y SELECT SUM(T1_1.C2) AS C1, T1_2.z AS C2 FROM ( σ R T.b<5 SELECT T2_1.b AS C1, SUM(T2_1.a) AS C2 FROM T AS T2_1 GROUP BY T2_1.b G T.b,SUM(T.a) HAVING T2_1.b < 5 ) AS T1_1 JOIN R AS T1_2 ON T1_1.C1 = T1_2.y T

  22. Case Study • Rule to test: Cannot be represented U σ ApplySJ Preliminary rule Target rule ApplySJ ApplySJ R UA OR R σ1 σ2 R R σ1 σ2 EXISTS EXISTS σ1 σ2 ApplySJ = Apply-Semi-Join UA = Union All U = Union

  23. Case Study– Basic test case • RelOps code: PredExists a = new PredExists(new OpFilter(new OpGet(), null)); PredExists b = new PredExists(new OpFilter(new OpGet(), null)); PredBinaryOp p = new PredBinaryOp(a, b, PredBinaryOp.LogicOp.Or); OpFiltermainFilter = new OpFilter(new OpGet(), p); • … And that’s the basic test case

  24. Case Study – Basic test case • Goes into RelGen (using TPC-H database): Predicate Generators σ O_SHIPPRIORITY, O_COMMENT, … CorrelationPredicate OR EXISTS EXISTS Available relations σ σ S_NATIONKEY >= O_SHIPPRIORITY S_ADDRESS < O_COMMENT ORDERS ORDERS S_NATIONKEY, S_ADDRESS, … SUPPLIER SUPPLIER SUPPLIER …

  25. Case Study – Basic test case • Goes into SqlGen: σ SELECT * FROM [ORDERS] AS T1_1 WHERE (EXISTS ( SELECT 1 AS C1 FROM [SUPPLIER] AS T2_1 WHERE T2_1.S_NATIONKEY >= T1_1.O_SHIPPRIORITY)) OR (EXISTS ( SELECT 1 AS C1 FROM [SUPPLIER] AS T2_1 WHERE T2_1.S_ADDRESS < T1_1.O_COMMENT)) ORDERS OR EXISTS EXISTS σ σ S_NATIONKEY >= O_SHIPPRIORITY S_ADDRESS < O_COMMENT SUPPLIER SUPPLIER

  26. Case Study– Beyond the basic case • More random exploration • Embed the basic tree pattern into completely random trees • Systematic exploration of various dimensions • Number of subquery disjunctives • Add scalar disjunctives alongside the relational ones • Add more operators in the relational disjunctives • …

  27. Verification • How do we verify the correct behavior for semi-random queries? • Be creative!

  28. Verification – Example approaches • Rule modeling • Do the same transformation in the input query (usually very easy in RelOps) • Present the two queries to the optimizer • Should get the same plan • Only works when the rule output is expressible in SQL • Rules on & off • Turn the rule under test off • Should get a different plan, but same results • Only works for non-essential exploration rules

  29. Conclusion • Testing the Query Optimizer transformation rules using abstract relational trees • Utilizing QRel to go from abstract relational trees to concrete SQL queries • Future directions • Libraries of abstract relational trees • More advanced customizations of tree generation • Combinatorial techniques for systematic exploration of various trees

More Related