
Advanced SQLProgramming Mark Holm Centerfield Technology
Goals • Introduce some useful advanced SQL programming techniques • Show you how to let the database do more work to reduce programming effort • Go over some basic techniques and tips to improve performance 2
Notes • V4R3 and higher syntax used in examples • Examples show only a small subset of what can be done! 3
Agenda • Joining files - techniques, do’s and don’ts • Query within a query - Subqueries • Stacking data - Unions • Simplifying data with Views • Referential Integrity and constraints • Performance, performance, performance 4
Joining files • Joins are used to relate data from different tables • Data can be retrieved with one “open file” rather than many • Concept is identical to join logical files without an associated permanent object (except if the join is done with an SQL view) 5
Join types • Inner Join • Used to find related data • Left Outer (or simply Outer) Join • Used to find related data and ‘orphaned’ rows • Exception Join • Used to only find ‘orphaned’ rows • Cross Join • Join all rows to all rows 6
Sample tables Employee table Department table
Inner Join • Method #1 - Using the WHERE Clause SELECT LastName, Division FROM Employee, Department WHERE Employee.Dept = Department.Dept • Method #2 - Using the JOIN Clause SELECT LastName, Division FROM Employee INNER JOIN Department ON Employee.Dept = Department.Dept NOTE: This method is useful if you need to influence the order of the tables are joined in for performance reasons. Only works on releases prior to V4R4. 8
Results • Return list of employees that are in a valid department. • Employee ‘Smith’ is not returned because she is not in a department listed in the ‘Department’ table Result table 9
Left Outer Join • Must use Join Syntax SELECT LastName, Area FROM Employee LEFT OUTER JOIN Department ON Employee.Dept = Department.Dept 10
Results • Return list of employees even if they are not in a valid department • Employee ‘Smith’ has a NULL Area because it could not be associated with a valid Dept Result table 11
Exception Join • Must use Join Syntax SELECT LastName, Area FROM Employee EXCEPTION JOIN Department ON Employee.Dept = Department.Dept 12
Results • Return list of employees only if they are NOT in a valid department • Employee ‘Smith’ is only one without a valid department Result table 13
WARNING! • The order tables are listed in the FROM clause is important • For OUTER and EXCEPTION joins, the database must join the tables in that order. • The result may be horrible performance…more on this topic later 14
Observations • Joins provide one way to bury application logic in the database • Each join type has a purpose and can be used to not only get the data you want but identify “incomplete” information • With some exceptions, if joined properly performance should be at least as good as an application 15
Subqueries • Subqueries are a powerful way to select only the data you need without separate statements. • Example: List employees making a higher than average salary 16
Subquery Example SELECTFNAME, LNAME FROM EMPLOYEE WHERE SALARY > (SELECT AVG(SALARY) FROM EMPLOYEE) SELECTFNAME, LNAME FROM EMPLOYEE WHERE SALARY > (SELECT AVG(SALARY) FROM EMPLOYEE WHERE LNAME = ’JONES’) 17
Subqueries - types • Correlated • Inner select refers to part of the outer (parent) select (multiple evaluations) • Non-Correlated • Inner select does not relate to outer query (one evaluation) 18
Subquery Tips 1 • Subquery optimization (2nd statement will be faster) • SELECT name FROM employee WHERE salary > ALL (SELECT salary FROM salscale) • SELECT name FROM employee WHERE salary > (SELECT max(salary) FROM salscale) 19
Subquery Tips 2 • Subquery optimization (2nd statement will be faster) • SELECT name FROM employee WHERE salary IN (SELECT salary FROM salscale) • SELECT name FROM employee WHERE EXISTS (SELECT salary FROM salscale WHERE employee.salid = salscale.salid) 20
UNIONs • Unions provide a way to append multiple row sets files in one statement • Example: Process all of the orders from January and February SELECT * FROM JanOrders WHERE SKU = 199976 UNION SELECT * FROM FebOrders WHERE SKU = 199976 21
Unions • Each SELECT statement that is UNIONed together must have the same number of result columns and have compatible types • Two forms of syntax • UNION ALL -- allow duplicate records • UNION -- return only distinct rows 22
Views • Views provide a convenient way to permanently put SQL logic • Create once and use many times • Also make the database more understandable to users • Can put simple business rules into views to ensure consistency 23
Views • Example: Make it easy for the human resources department to run a report that shows ‘new’ employees. CREATE VIEW HR/NEWBIES (EMPLOYEE_NAME, DEPARTMENT, HIRE_DATE)AS SELECTconcat(concat(strip(last_name),','),strip(first_name)), department, hire_date FROMHR/EMPLOYEE WHERE (year(current date)-year(hire_date)) < 2 24
Performance • SQL performance is harder to predict and tune than native I/O. • SQL provides a powerful way to manipulate data but you have little control over HOW it does it. • Query optimizer takes responsibility for doing it ‘right’. 25
Performance - diagnosis • Getting information about how the optimizer processed a query is crucial • Can be done via one or all of the following: • STRDBG: debug messages in job log • STRDBMON: optimizer info put in file • QAQQINI: can be used to force messages • CHGQRYA: messages put out when time limit set to 0 26
Performance tips • Create indexes • Over columns that significantly limit data in WHERE clause • Over columns that join tables together • Over columns used in ORDER BY and GROUP BY clauses 27
Performance tips • Create Encoded Vector Indexes (EVI’s) • Most useful in heavy query environments with a lot of data (e.g. large data warehouses) • Helps queries that process between 20-60% of a table’s data • Create over columns with a modest number of distinct values and those with data skew • EVI’s bridge the gap between traditional indexes and table scans 28
Performance tips • Encourage optimizer to use indexes • Use keyed columns in WHERE clause if possible • Use ANDed conditions as much as possible • OPTIMIZE FOR n ROWS • Don’t do things that eliminate index use • Data conversion (binary-key = 1.5) • LIKE clause w/leading wildcard (NAME LIKE ‘%JOE’) 29
Performance tips • Keep statements simple • Complex statements are much more difficult to optimize • Provide more opportunity for the optimizer to choose a sub-optimal plan of attack 30
Performance tips • Enable DB2 to use parallelism • Query processed by many tasks (CPU parallelism) or by getting data from many disks at once (I/O parallelism) • CPU parallelism requires IBM’s SMP feature and a machine with multiple processors • Enabled via the QQRYDEGREE system value, CHGQRYA, or the QAQQINI file 31
Other useful features • CASE clause - conditional calculations • ALIAS - access to multi-member files • Primary/Foreign keys - referential integrity • Constraints 32
CASE • Conditional calculations with CASE SELECTWarehouse, Description, CASERegionCode WHEN'E'THEN'East Region' WHEN'S'THEN'South Region' WHEN'M'THEN'Midwest Region' WHEN'W'THEN'West Region' END FROMLocations 33
CASE • Avoiding calculation errors (e.g. division by 0) SELECT Warehouse, Description, CASE NumInStock WHEN 0 THEN NULL ELSE CaseUnits/NumInStock END FROM Inventory 34
ALIAS names • The CREATE ALIAS statement creates an alias on a table, view, or member of a database file. • CREATE ALIAS alias-name FOR table member • Example: Create an alias over the second member of a multi-member physical file • CREATE ALIAS February FOR MonthSales February 35
Referential Integrity • Keeps two or more files in synch with each other • Ensures that children rows have parents • Can also be used to automatically delete children when parents are deleted 36
Referential Integrity Rules • A row inserted into a child table must have a parent row (typically in another table). • Parent rules • A parent row can not be deleted if there are dependent children (Restrict rule) OR • All children are also deleted (Cascade rule) OR • All children’s foreign keys are changed (Set Null and Set Default rules) 37
Primary key must be unique Primary Key Foreign Key Parent table Child table 38
Referential Integrity syntax • ALTER TABLE Hr/Employee ADD CONSTRAINT EmpPK PRIMARY KEY (EmployeeId) • ALTER TABLE Hr/Department ADD CONSTRAINT EmpFK FOREIGN KEY (EmployeeId) REFERENCES Hr/Employee (EmployeeId) ON DELETE CASCADE ON UPDATE RESTRICT 39
Check Constraints • Rules which limit the allowable values in one or more columns: CREATE TABLE Employee (FirstName CHAR(20), LastName CHAR(30), Salary CHECK (Salary>0 AND Salary<200000)) 40
Check Constraints • Effectively does data checking at the database level. • Data checking done with display files or application logic can now be done at the database level. • Ensures that it is always done and closes “back doors” like DFU, ODBC, 3-rd party utilities…. 41
Other resources • Database Design and Programming for DB2/400 - book by Paul Conte • SQL for Smarties - book by Joe Celko • SQL Tutorial - www.as400network.com • AS/400 DB2 web site at http://www.as400.ibm.com/db2/db2main.htm • Publications at http://publib.boulder.ibm.com/pubs/html/as400/ • Our web site at http://www.centerfieldtechnology.com 42
Summary • SQL is a powerful way to access and process data • Used effectively, it can reduce the time it takes to build applications • Once tuned, it can perform very close (and sometimes better) than HLL’s alone 43