Rewriting Procedures for Batched Bindings

Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008

Motivation Queries/updates are often invoked repeatedly. E.g.: • Nested subqueries (with correlated evaluation) • Queries embedded in user-defined functions (UDFs) that are invoked from other queries • Queries/updates in stored procedures that are invoked repeatedly (e.g. batch jobs) • Queries/updates invoked inside loops in application programs

Example: A Nested Query List the orders of high worth (> 10K) SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); IterativeExecution: For each record in the result of the outer query block - bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)

Optimizing Repeated Invocations • Iterative execution of queries often performs poorly • Redundant I/O • Random I/O and poor buffer effects • Network round-trip delays • Decorrelation is widely used for optimizing nested queries

Query Decorrelation Original Query SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); After Decorrelation(Most systems do this automatically) SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUPBY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;

Limitations of Decorrelation Tricky at times… • COUNT aggregate • Non-equality correlation predicates The solutions may not produce the best plan Decorrelation techniques are not applicable for: • Nested invocation of procedures with complex logic and embedded query invocations • Iterative invocation of queries/updates from application code

Example: UDF Invoked from a Query SELECT * FROM category WHEREcount_items(category-id) > 50; // Count the items in a given category and its sub-categories int count_items(int categoryId) { … while(…) { … … SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; … } } Procedural logic with embedded queries

Key Idea: Parameter Batching • Repeated invocation of an operation is replaced by a single invocation of its batched form • Batched form: Works on a set of parameters • Benefits • Choice of efficient set-oriented plans • Repeated selection → Join • Efficient integrity checks • Efficient disk access (sort RIDs before fetch) • Reduced network round-trip delay

Batched Forms of Basic Operations • Insert • insert into <table1> select … from <table2> … • Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load) • Update • update <table1> from <table2> where … (equivalent to SQL:2003 merge statement) • Queries • Make use of join or outer-join (seen in decorrelation)

SQL Merge GRANTMASTER GRANTLOAD merge into GRANTMASTER GM using GRANTLOAD GL on GM.empid=GL.empidwhen matched then update set GM.grants=GL.grants; Notation:Mc1=c1’,c2=c2’,… cn=cn’(r, s)

Effect of Batch Size on Inserts Bulk Load: 1.3 min

Iterative and Set-Oriented Updates on the Server Side • TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey) • Iterative update of all the records using T-SQL script (each update has an index lookup plan) • Single commit at the end of all updates Takes 1 minute Same update processed as a merge (update … from …) Takes 15 seconds

The Challenge Given a procedure, how to obtain its batched form? Possible to manually rewrite, but time consuming and error-prone.

Our Work • Automatic generation of batched forms of UDFs/stored procedures using rewrite rules • Automatic rewrite of programs to replace looping with batched invocation

Batched Forms • Batched form qb of a pure function q • Returns results for a set of parameters • Result in the form {(parameter, result)} • For queries: standard techniques for creating batched forms (from work on decorrelation) Example: Original query: SELECT item-id FROM item WHERE category-id=? Batched form: SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;

Batch Safe Operations • Batched forms – no guaranteed order of parameter processing • Can be a problem for operations having side-effects Batch-Safe operations • All operations without side effects • Also a few operations with side effects • E.g.: INSERT on a table with no constraints • Operations inside unordered loops (e.g., cursor loops with no order-by)

Generating Batched Forms of Procedures Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb { < body of p> < collect the return value > } return the collected results paired with corrsp. params; } Step 2: Optimize query invocations in the trivial batched form

Rule 1A: Rewriting a Simple Set Iteration Loop where q is any batch-safe operation with qb as its batched form Rule 1B Handles return values

Condition for Invocation Return values Operation to Batch // Batched invocation Let s = // Merge the results s where Rule 1C: Batching Conditional Statements

Table(T) t; while(p) { ss1 modified to save local variables as a tuple in t } Collect the parameters for each r in t { sq modified to use attributes of r; } Can apply Rule 1A-1C and batch. for each r in t { ss2 modified to use attributes of r; } Process the results Rule 2: Splitting a Loop while (p) { ss1; sq; ss2; } * Conditions Apply

Rule 2: Pre-conditions • The conditions make use of the data dependence graph • Data Dependence Graph • Nodes: program statements • Edges: dependencies between statements that read/write same location • Types of Dependencies • Flow (Write  Read), Anti (Read  Write) and Output (Write  Write) • Loop-carried flow/anti/output • Dependencies across iterations • Pre-conditions for Rule-2 • No loop-carried flow/output dependencies cross the points at which the loop is split • No loop-carried dependencies through external data (e.g., DB)

Data Dependencies Flow Dependence Anti Dependence Output Dependence Control Dependence Loop-Carried Need for Reordering Statements (s1) while (category != null) { (s2) item-count = q1(category); (s3) sum = sum + item-count; (s4) category = getParent(category); }

Reordering Statements to Enable Rule 2 while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category); } Splitting made possible after reordering while (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count; }

Cycles of Flow Dependencies

Rule 4: Control Dependencies while (…) { item = …; qty = …; brcode = …; if (brcode == 58) { brcode = 1; q(item, qty, brcode); } } Remember the branching decision in a boolean variable while (…) { item = …; qty = …; brcode = …; boolean cv = (brcode == 58); cv? brcode = 1; cv? q(item, qty, brcode); }

Cascading of Rules After applying Rule 2 Table(…) t; while (…) { r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1; t.addRecord(r); } for each r in t { r.cv? q(r.item, r.qty, r.brcode); } Rule 1C qb(Pitem,qty,brcode(scv=true(t))

Batching Across Multiple Levels while(…) { …. while(…) { ... q(v1, v2, … vn); … } … } while(…) { …. Table t (…); while(…) { ... } qb(t); … } Batch q w.r.t inner loop

Parameter Batches as Nested Tables

Nest and Unnest Operations • μc(T) : Unnest T w.r.t. table-valued column c • vS s(T) : Group T on columns other than S and nest the columns in S under the name s

Rule 6: Unnesting Nested Batches

Implementation and Evaluation • Conceptually the techniques can be used with any language (PL/SQL, Java, C#-LINQ) • We implemented for Java using the SOOT framework for program analysis Evaluation • No benchmarks for procedural SQL • Scenarios from three real-world applications, which faced performance problems • Data Sets: TPC-H and synthetic

Application 1: ESOP Management App Process records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic. - Validate inputs - Lookup existing record - Update or Insert Rewritten program used outer-join and merge.

Application 2: Category Traversal Find the maximum size of any part in a given category and its sub-categories. Clustered Index CATEGORY (category-id) Secondary Index PART (category-id) Original Program Repeatedly executed a query that performed selection followed by grouping. Rewritten Program Group-By followed by Join

Application 3: Value Range Expansion Expand records of the form: (start-num, end-num, issued-to, …) Performed repeated inserts. Rewritten program Pulled the insert stmt out of the loop and replaced it with batched insert. ~75% improvement ~10% overhead Log scale

Related Work • Query unnesting • E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01] • We extend the benefits of unnesting to procedural nested blocks • Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work) • Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92] • Also perform program rewriting, but • Do not address batching of queries/procedure calls within the loop • Limited language constructs - No WHILE loops, IF-THEN-ELSE • Parallelizing compilers Kennedy[90], Padua[95] • We borrow and extend the techniques

Conclusion • Automatic rewrite of programs for set-orientation is possible • Combining query rewrite with program analysis is the key • Our experiments on real-world scenarios show significant benefits due to batching Future Work • Cost-based selection of operations to batch • Handling exceptions • Automatically deciding whether an operation is batch-safe • Implementing rewriting for PL/SQL

Questions?

Rewriting Procedures for Batched Bindings

Rewriting Procedures for Batched Bindings

Presentation Transcript

Rewriting Academic Standing for Success

Ideas for Article Rewriting

Rewriting Service

Bindings

Rewriting Services

Rewriting Services

Revising = Rewriting

Rewriting Services

Rewriting sentences

Rewriting Formulas

Rewriting Formulas

Rewriting Formulas

C++ Bindings

2010/2011 BINDINGS

EAP Channel Bindings

URL Rewriting

Names and Bindings

Rewriting Procedures for Batched Bindings

Bindings

Rewriting Equations 

Ride snowboard bindings