Download Presentation
## Optimizing recursive queries in sql

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Optimizing recursive queries in sql**BY: Mark Gruszecki**Overview**• What is a Recursive Query? • Definition(s) and Algorithm(s) • Optimization Techniques • Practical Issues • Impact of each Optimization • Conclusion**In order to understand a Recursive Query, you must first**understand a Recursive Query… CREATE RECURSIVE VIEW R(d, i, j, v) AS ( SELECT 1, i, j, v FROM T UNION ALL SELECT d+1, R.i, T.j, R.v + T.v FROM R JOIN T ON R.j = T.i WHERE d < 8);**What is a Recursive Query used for?**• Any kind of Hierarchical Data is more easily accessed using a Recursive Query. • For example, displaying Employees in an orginizational chart.**Definition(s) and Algorithm(s)**• A base table T is defined as T(i, j, v) with primary key (i, j) and v representing a numerical value. • Define the result table from a recursive query by R(d, i, j, v) with primary key (d, i, j). • The queries of interest are of the form • R[k] = T T T … T where is the recursive query. • Let G = (V, E) be a directed graph with n vertices and m edges. • The algorithm to evaluate optimizations is: • R =**Optimization Techniques**• Early evaluation of row select conditions. • Deleting Duplicate Rows. • Indexing base and result tables for efficient joins.**Early Evaluation of Row Select Conditions**• Used on Queries of the form: SELECT i , j, v FROM R WHERE <condition> • Two Cases: • If the “WHERE” condition doesn’t participate in the JOIN condition of the recursion. • The “WHERE” condition is part of the JOIN in the recursion.**Deleting Duplicate Rows**• Suppose we want to know which vertices of a graph are connected to a specific node (e.g. all dependants of a particular person). • What if G is complete, dense, sparse, or even a tree? • Queries are optimized by deleting duplicate rows at each step.**Indexing Base and Result Tables for Efficient Joins**• Two Choices: • Index based on i and j. • Gives rise to an optimal Hash Join • Cannot uniquely identify rows • Index based on primary key of T and R • Can uniquely identify rows**Practical Issues**• The possible size of a result table of a recursive query is |T| x |T| x … |T|. • In general, if there are n recursive steps and T has m entries, then there are m^n.**Impact of each Optimization**• Early Evaluation of Row Selection Conditions • Good in all cases • Deleting Duplicate Rows • Good in certain cases, but not necessary in all • Indexing Base and Result Tables for Efficient Joins • If G is highly connected, slight loss of performance (Hash Collisions). • Otherwise, this is the best optimization.**Conclusion**• Recursive Queries are an important tool to be exposed to when dealing with databases. • There are three types of optimizations of these queries discussed: • Early evaluation of row selections • Deleting duplicate rows • Indexing for more efficient joins • Every optimization is good, but some are more applicable in certain circumstances than others.**Questions(?) and References**• Carlos Ordonez (2005) Optimizing Recursive Queries in SQL. Retrieved February 10, 2009 from IEEE Computer Science Digital Library • http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=/rzajq/rzajqrcteexample.htm