Joe Carola, Siemens HS

Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and TuningJoe CarolaSiemens Medical Solutions, Health Services

Joe Carola, Siemens HS • Bio • 30+ years in Information Technology • 26 of them dedicated to Relational Database, covering all areas of database design, implementation, support, performance, etc. • Full History: • Development: Prog Trainee, Prog/Analyst, Sys Analyst • DBA trainee, DBA, Mgr-DBA (“Actor on the Scene”) • Lead DB Consultant for Codd and Date Consulting • Director-DBA • Technical Database Architect (Currently) • DB2, Microsoft SQL Server, Oracle, Sybase SQL Server • Mainframe, Unix, Wintel • 1993 recipient of an International Database User Group Award for Information Excellence, based on his contributions in the area of Relational Database technology, and has presented locally and internationally on a variety of Relational Database topics. • Chairman, Delaware Valley DB2 User Group

Agenda – “Practical Stuff” • DB Design • In the simplest terms - Logical to Physical • Optimization and Indexing • Optimizer • Index Types and how they are used • Monitoring and Tuning • Monitoring Process and what to monitor • Tuning Steps The above is where I see that the rubber meets the road……

DB Design

DB Design • Logical Design • Provides complete understanding of data and it usage • Defining data entities and their attributes • Provides primary and foreign key definitions • Physical Design • Based on information gathered during Logical design • Data must be understood to do this correctly and efficiently • Provides physical aspects to enhance data usage • Data types, data lengths, row sizes, • Provides precise access paths (Indexes) to rows of data • Support primary and foreign keys • Secondary indexes • Poor Logical and Physical Database design can be the largest reason for performance issues • The price for poor DB Design must be paid at execution time

DB Design • Normalization: A synthesis of data design 1st Normal Form – Data is dependent on…………..The Key 2nd Normal Form – Data is dependent onThe whole Key 3rd Normal Form - “ “ “And nothing but the Key, “so help me Codd” • Edgar F. (Ted) Codd – Developed the Relational Model “A Relational Model for Large Shared Data Banks” (1970) • Solid, yet complex, mathematical foundation • Relational Algebra • Domains, Attributes, Tuples, and Relations (OMG!) • Re-stated to simpler terms….. • Simple to understand tables, rows, and columns • The Simplicity is partially the reason for the performance issues being addressed every day • Too many shortcuts are taken • Too many non-experienced data designers are designing and implementing database applications

DB Design • 3rd Normal Form is basically 1st cut physical • Next step after 3rd NF in Physical DB Design is a very important step for Performance, Concurrency, Operations, etc. • De-Normalization takes place here • Storing of data in summary or derived format • If it doesn’t happen, it takes place at execution time Result: High processing costs – Materialization of the result data Administration costs – Maintenance of the data Low Currency – Concurrent Access to the data • However……. • Anomalies are created as a result of De-normalization • Insert, Delete, Update • They all cost extra processing also • Must strike a balance based on requirements on performance, availability, storage, administration

Optimization&Indexing

Optimization and Indexing • Must understand the basics of indexes and performance statistics. • As a general rule, indexes should be kept as narrow as possible, most likely following a business use requirement, to reduce the amount of processing overhead associated with each query. • Being familiar with how optimization works will improve the accuracy of your decision making when designing indexes • Understanding how the optimizer works is the first step toward the establishment of a truly optimized database environment • As the sophistication of your database implementation increases……. • The need to optimize performance will also increase.

Optimization and Indexing • SQL Query tuning is one of the most important tasks to improve application performance • Biggest bang for the performance dollar over everything else (“IMO”): • Network, Storage, Memory, Processor • Should be done in the design and testing phases • However, no amount of Database tuning or SQL statement tuning can make up for inefficient application design/coding • 60% to 80% of Application Problems come from poorly written SQL or the code around it i.e. Prog101 abuse can wreck an application too!!

Optimizer • Responsible for choosing the least costly way to execute SQL (DML). • Creates an access path with it’s decision • Performed at plan compilation time • Determines Access Methods • Index Usage • Table Scan • Join Method • Sort • Determines if Data and/or Index pages can be read in advance • Asynchronous Pre-fetch

The Importance of Statistics • Statistics provide the optimizer with the information to make decisions Table Indexspace RDBMS Catalog Or Dictionary Generation Tablespace As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less than optimal decisions on how to process a query.

Statistical Terms/Concepts • Cardinality • Measures how many unique values exist in the table • Density • Measures the uniqueness of values within a table. • Helps the optimizer determine how many rows will be returned for a given key value • Indexes with high densities will likely be ignored by the optimizer • i.e. the index is highly non-unique • Selectivity • Measures the number of rows that will be returned by a particular query. • Needed by Optimizer to calculate the relative cost of a query plan

STAGE 2 PREDICATES RELATIONAL DATA SERVICES STAGE 2 - Evaluated after data retrieval via the relational (NON-SARGABLE, Residual) data services which is more expensive than the Data Manager. DATA MANAGER STAGE 1 PREDICATES STAGE 1 - Evaluated at the time the data rows are retrieved (SARGABLE). Performance advantage in using STAGE 1 PREDICATES because this stage eliminates ROWS passed to STAGE 2 via the Data Manager. BUFFER MANAGER From Request to Response REQUEST RESPONSE PREDICATE ANY OTHER WITH INDEX (ES) INDEX KEY Non Indexed PREDICATE APPLIES REQUESTED DATA I/O

Indexing • A very necessary part of Successful Database Implementation I wonder what queries will be run ? What indexes will be needed? What columns will be used as predicates? What ORDER BY will be used most often? Why do some of my queries run so slow!

“Thanks for fixing my query, what did you do?” #!*#!!! Indexes are a good thing to add, however there is something to avoid….. ”I added an index to one of the columns “Great! Then add indexes to all the columns in my table

Types of Indexes • There are two types of indexes: clustered and non-clustered, each with unique advantages depending on the data set. • Clustered index • Dictates the storage order of the data in a table. Because the data is sorted, clustered indexes are more efficient on columns of data that are most often searched for ranges of values. This index type also excels at finding a specific row when the indexed value is unique. • Non-clustered index • Similar to an index in a textbook where the data is stored in one place and the data value in another. A query searches for the data value by first searching the non-clustered index to find the location of the data value in the table and then retrieves the data directly from that location. The non-clustered index is useful for queries resulting in exact matches.

1 2 3 Basic Index Usage Matching Index Scan Root Page Non-Leaf Page Leaf Page Data Page Data Page Data Page Select * From TABLE1 Where INDEXED_COL1 = 12345

2 1 Basic Index Usage Non-Matching Index Scan Root Page Non-Leaf Page Leaf Page Data Page Data Page Data Page Select * From TABLE1 Where INDEXED_COL1 > 00001

1 Basic Index Usage Index Only Root Page Non-Leaf Page Leaf Page Select COL1 From TABLE1 Where INDEXED_COL1 > 00001

Join Methods • Nested Loop Join SELECT A,B,X,Y FROM OUTER, INNER WHERE A=10 AND B=X Tables: OUTER INNER COMPOSITE Columns: A B X Y A B X Y 10 3 10 1 10 2 10 6 10 1 5 A 3 B 2 C 1 D 2 E 9 F 7 G 10 3 3 B 10 1 1 D 10 2 2 C 10 2 2 E 10 1 1 D 1.) Scan the outer table, For each qualifying row……… 2.) find all matching rows in the inner table, via table space scan or index access. The nested loop join produces this result

Join Methods • Merge Scan Join SELECT A,B,X,Y FROM OUTER, INNER WHERE A=10 AND B=X 1.) Condense and sort the outer table, or access it through an index on column B…... Condense and sort the inner table. Tables: OUTER INNER COMPOSITE Columns: A B X Y A B X Y 10 1 10 1 10 2 10 3 10 6 1 D 2 C 2 E 3 B 5 A 7 G 9 F 10 1 1 D 10 1 1 D 10 2 2 C 10 2 2 E 10 3 3 B The merge scan join produces this result 2.) Scan the outer table, For each qualifying row….… 3.) Scan a group if matching rows in the inner table.

Join Methods SELECT C2,C33 FROM OUTER, INNER WHERE C1 = A AND C2 = C22 • Hybrid / Hash Join 4.) List Prefetch inner table rows and complete partial rows 1.) Apply local predicates and organize qualifying rows in join column sequence by either sorting or accessing via join column index…. PARTIAL ROWS RESULT INNER OUTER R O W 1 2 3 4 5 6 R I D P1 P2 P3 P4 P5 P6 • C1 C2 • 1 • 1 • 2 • 3 • 6 • . • C22 C33 • D • C • E • B • A • G • C2 RID • P1 • P1 • P2 • P3 • P4 • C2 C33 • D • D • C • E • B RID LIST P1 P1 P2 P3 P4 3.) Create partial rows, and sort in RID sequence...…. 2.) Obtain only inner table RIDs via index access using sequenced join column key values...….

An Ounce of Prevention…., • Make your queries simple and efficient, ensuring the least costly access path available. • Try not to overload your tables with indexes • Try not to overload your indexes • Try not to overload your queries • Keep the Database healthy • Reorganization • Eliminates empty space, and fragmentation • Reduces I/O • Generate Statistics (if they are not automatic) • The Optimizer is very smart, but data attributes are always changing • DB Size/Volume, Data Skewness, Data Content • Analyze SQL Query and access path selection prior to implementing into a production environment. • Execute the Explain Plan periodically to determine what method the Optimizer is selecting for an access path.

“Explain” Plan / SHOWPLAN • Phase of the optimizer that captures information used in selecting the query access plan • Why use an Explain Plan? • Gives clues as to why the optimizer made access decisions • Can be used in advance of execution • Can be used to maintain a history of problem query access • Before/After new indexes additions • Before/After Statistics are Generated/Re-Generated • Before/After Data additions/changes/deletions • Problem determination is easier by comparing reference plans

Example • Graphical SHOWPLAN

Monitoring and Tuning

Monitor and Tuning • A Constant Process • A very necessary part of successful database implementation • Must be there to guarantee ongoing, optimal Database Performance Design Data Object Data Activity Data Redesign Tune 3.) Consider Fixes 1.) Collect Data Real time Periodic Historical Repeat 4.) Apply Fixes 2.) Analyze Data

Monitoring and Tuning • What to monitor • Healthiness of Database Objects • Growth • Fragmentation • Exists when TS and/or indexes have pages in which the logical ordering, based on key or link value, does not match the physical ordering of the pages inside the file • Causes additional I/O and additional storage • Causes of Fragmentation • DML (Insert, Delete, Update) • Inserts/Updates cause Page Splits • Delete/Updates cause holes

Monitoring and Tuning • What to monitor • Fragmentation illustrated • Reorganization • Reorders pages, compresses entries on a page • Always be sure to run new Statistics collection (for the Optimizer) Uniform pages in order Index 1 Page 8 Index 1 Page 1 Index 1 Page 7 Index 1 Page 2 Index 1 Page 4 Index 1 Page 3 Index 1 Page 6 Index 1 Page 5 Non-uniform pages, out of order Index 1 Page 8 Index 1 Page 1 Index 1 Page 5 Index 3 Page 1 Index 1 Page 2 Index 2 Page 1 Index 2 Page 2 Index 1 Page 4

Monitoring and Tuning • What to monitor • Object Usage • Access Patterns (Random, Sequential, Indexed, Non-Indexed) • I/O (Volume, Latency) • They tend to change over time as users learn the application • Memory Usage • Buffer Hit Ratio • Data/Index pages in the Buffer will avoid an I/O • Processing Activity • CPU utilization • Will indicate excessive searching and/or sorting • Parallel, Non-Parallel • Can speed up large searches • Can also monopolize all the processors • Locking • Timeouts • Deadlocks

Performance DB Monitoring and Tuning • How to monitor – Tool usage SQL Request Tool to Collect & Interpret Alerts Statistical Generation DBMS Reports Result

Monitoring and Tuning Steps • Find the statements that consume the most resources • “Heavy Hitters” • Physical Reads will indicate SQL requiring disk access to get queries • Most expensive part of a Query!!! • Buffer Gets indicate the amount of searching going on within a query High Buffer Gets = Lots of Searching = Lots of Processing • Sorts information will indicate if SQL is doing an excessive amount of sorting • Find the offending statements without adding to the performance problem • Use simple top down approach • Avoid heavy tracing • Know the Database Design and Usage • Run Explain Plan on SQL

AdditionalQuestions?

Joe Carola, Siemens HS

Joe Carola, Siemens HS

Presentation Transcript

SIEMENS

SIEMENS DRIVES

Siemens PLC

Siemens

Siemens Cranes

-Huntsville HS -Grissom HS -JO Johnson HS Columbia HS -Butler HS -Lee HS -New Century

SIEMENS

John Elwell Siemens Communications (International) john.elwell@siemens

Siemens Project

Siemens Trains

Siemens

Siemens

Siemens company