slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Joe Carola, Siemens HS PowerPoint Presentation
Download Presentation
Joe Carola, Siemens HS

Loading in 2 Seconds...

play fullscreen
1 / 34

Joe Carola, Siemens HS - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and Tuning Joe Carola Siemens Medical Solutions, Health Services. Joe Carola, Siemens HS. Bio 30+ years in Information Technology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Joe Carola, Siemens HS' - berget


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Database Performance Topics: - DB Design - Optimization & Indexing - Monitoring and TuningJoe CarolaSiemens Medical Solutions, Health Services

joe carola siemens hs
Joe Carola, Siemens HS
  • Bio
    • 30+ years in Information Technology
      • 26 of them dedicated to Relational Database, covering all areas of database design, implementation, support, performance, etc.
    • Full History:
      • Development: Prog Trainee, Prog/Analyst, Sys Analyst
      • DBA trainee, DBA, Mgr-DBA (“Actor on the Scene”)
      • Lead DB Consultant for Codd and Date Consulting
      • Director-DBA
      • Technical Database Architect (Currently)
      • DB2, Microsoft SQL Server, Oracle, Sybase SQL Server
      • Mainframe, Unix, Wintel
      • 1993 recipient of an International Database User Group Award for Information Excellence, based on his contributions in the area of Relational Database technology, and has presented locally and internationally on a variety of Relational Database topics.
      • Chairman, Delaware Valley DB2 User Group
agenda practical stuff
Agenda – “Practical Stuff”
  • DB Design
    • In the simplest terms - Logical to Physical
  • Optimization and Indexing
    • Optimizer
    • Index Types and how they are used
  • Monitoring and Tuning
    • Monitoring Process and what to monitor
    • Tuning Steps

The above is where

I see that the rubber

meets the road……

db design1
DB Design
  • Logical Design
    • Provides complete understanding of data and it usage
    • Defining data entities and their attributes
    • Provides primary and foreign key definitions
  • Physical Design
    • Based on information gathered during Logical design
      • Data must be understood to do this correctly and efficiently
    • Provides physical aspects to enhance data usage
      • Data types, data lengths, row sizes,
    • Provides precise access paths (Indexes) to rows of data
      • Support primary and foreign keys
      • Secondary indexes
  • Poor Logical and Physical Database design can be the largest reason for performance issues
    • The price for poor DB Design must be paid at execution time
db design2
DB Design
  • Normalization: A synthesis of data design

1st Normal Form – Data is dependent on…………..The Key

2nd Normal Form – Data is dependent onThe whole Key

3rd Normal Form - “ “ “And nothing but the Key,

“so help me Codd”

    • Edgar F. (Ted) Codd – Developed the Relational Model

“A Relational Model for Large Shared Data Banks” (1970)

      • Solid, yet complex, mathematical foundation
        • Relational Algebra
        • Domains, Attributes, Tuples, and Relations (OMG!)
      • Re-stated to simpler terms…..
        • Simple to understand tables, rows, and columns
        • The Simplicity is partially the reason for the performance issues being addressed every day
          • Too many shortcuts are taken
          • Too many non-experienced data designers are designing and implementing database applications
db design3
DB Design
  • 3rd Normal Form is basically 1st cut physical
    • Next step after 3rd NF in Physical DB Design is a very important step for Performance, Concurrency, Operations, etc.
      • De-Normalization takes place here
        • Storing of data in summary or derived format
        • If it doesn’t happen, it takes place at execution time

Result:

High processing costs – Materialization of the result data

Administration costs – Maintenance of the data

Low Currency – Concurrent Access to the data

    • However…….
      • Anomalies are created as a result of De-normalization
        • Insert, Delete, Update
        • They all cost extra processing also
      • Must strike a balance based on requirements on performance, availability, storage, administration
optimization and indexing
Optimization and Indexing
  • Must understand the basics of indexes and performance statistics.
    • As a general rule, indexes should be kept as narrow as possible, most likely following a business use requirement, to reduce the amount of processing overhead associated with each query.
  • Being familiar with how optimization works will improve the accuracy of your decision making when designing indexes
    • Understanding how the optimizer works is the first step toward the establishment of a truly optimized database environment
  • As the sophistication of your database implementation increases…….
    • The need to optimize performance will also increase.
optimization and indexing1
Optimization and Indexing
  • SQL Query tuning is one of the most important tasks to improve application performance
  • Biggest bang for the performance dollar over everything else (“IMO”):
    • Network, Storage, Memory, Processor
  • Should be done in the design and testing phases
    • However, no amount of Database tuning or SQL statement tuning can make up for inefficient application design/coding
      • 60% to 80% of Application Problems come from poorly written SQL or the code around it

i.e. Prog101 abuse can wreck an application too!!

optimizer
Optimizer
  • Responsible for choosing the least costly way to execute SQL (DML).
  • Creates an access path with it’s decision
    • Performed at plan compilation time
    • Determines Access Methods
      • Index Usage
      • Table Scan
      • Join Method
      • Sort
    • Determines if Data and/or Index pages can be read in advance
      • Asynchronous Pre-fetch
the importance of statistics
The Importance of Statistics
  • Statistics provide the optimizer with the information to make decisions

Table

Indexspace

RDBMS

Catalog

Or

Dictionary

Generation

Tablespace

As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less than optimal decisions on how to process a query.

statistical terms concepts
Statistical Terms/Concepts
  • Cardinality
    • Measures how many unique values exist in the table
  • Density
    • Measures the uniqueness of values within a table.
    • Helps the optimizer determine how many rows will be returned for a given key value
    • Indexes with high densities will likely be ignored by the optimizer
      • i.e. the index is highly non-unique
  • Selectivity
    • Measures the number of rows that will be returned by a particular query.
    • Needed by Optimizer to calculate the relative cost of a query plan
slide14

STAGE 2

PREDICATES

RELATIONAL DATA SERVICES

STAGE 2 - Evaluated after data retrieval via the relational (NON-SARGABLE, Residual) data services which is more expensive than the Data Manager.

DATA MANAGER

STAGE 1

PREDICATES

STAGE 1 - Evaluated at the time the data rows are retrieved (SARGABLE). Performance advantage in using STAGE 1 PREDICATES because this stage eliminates ROWS passed to STAGE 2 via the Data Manager.

BUFFER MANAGER

From Request to Response

REQUEST

RESPONSE

PREDICATE ANY OTHER

WITH INDEX (ES) INDEX KEY

Non Indexed

PREDICATE APPLIES

REQUESTED DATA

I/O

indexing
Indexing
  • A very necessary part of Successful

Database Implementation

I wonder what queries will

be run ?

What indexes will be needed?

What columns will be used as

predicates?

What ORDER BY will be

used most often?

Why do some of

my queries run so

slow!

slide16

“Thanks for fixing

my query, what did

you do?”

#!*#!!!

Indexes are a good thing to add, however there is something to avoid…..

”I added an index to

one of the columns

“Great! Then add

indexes to all the

columns in my table

types of indexes
Types of Indexes
  • There are two types of indexes: clustered and non-clustered, each with unique advantages depending on the data set.
  • Clustered index
    • Dictates the storage order of the data in a table. Because the data is sorted, clustered indexes are more efficient on columns of data that are most often searched for ranges of values. This index type also excels at finding a specific row when the indexed value is unique.
  • Non-clustered index
    • Similar to an index in a textbook where the data is stored in one place and the data value in another. A query searches for the data value by first searching the non-clustered index to find the location of the data value in the table and then retrieves the data directly from that location. The non-clustered index is useful for queries resulting in exact matches.
basic index usage

1

2

3

Basic Index Usage

Matching Index Scan

Root Page

Non-Leaf Page

Leaf Page

Data

Page

Data

Page

Data Page

Select * From TABLE1 Where INDEXED_COL1 = 12345

basic index usage1

2

1

Basic Index Usage

Non-Matching Index Scan

Root Page

Non-Leaf Page

Leaf Page

Data

Page

Data

Page

Data

Page

Select * From TABLE1 Where INDEXED_COL1 > 00001

basic index usage2

1

Basic Index Usage

Index Only

Root Page

Non-Leaf Page

Leaf Page

Select COL1 From TABLE1 Where INDEXED_COL1 > 00001

join methods
Join Methods
  • Nested Loop Join

SELECT A,B,X,Y

FROM OUTER, INNER

WHERE A=10 AND B=X

Tables: OUTER INNER COMPOSITE

Columns: A B X Y A B X Y

10 3

10 1

10 2

10 6

10 1

5 A

3 B

2 C

1 D

2 E

9 F

7 G

10 3 3 B

10 1 1 D

10 2 2 C

10 2 2 E

10 1 1 D

1.) Scan the outer table,

For each qualifying row………

2.) find all matching rows in the inner table, via table space scan or index access.

The nested loop join produces this result

join methods1
Join Methods
  • Merge Scan Join

SELECT A,B,X,Y

FROM OUTER, INNER

WHERE A=10 AND B=X

1.) Condense and sort the outer table, or access it through an index on column B…...

Condense and sort the inner table.

Tables: OUTER INNER COMPOSITE

Columns: A B X Y A B X Y

10 1

10 1

10 2

10 3

10 6

1 D

2 C

2 E

3 B

5 A

7 G

9 F

10 1 1 D 10 1 1 D

10 2 2 C

10 2 2 E

10 3 3 B

The merge scan join produces this result

2.) Scan the outer table,

For each qualifying row….…

3.) Scan a group if matching rows in the inner table.

join methods2
Join Methods

SELECT C2,C33

FROM OUTER, INNER

WHERE C1 = A AND C2 = C22

  • Hybrid / Hash Join

4.) List Prefetch inner table rows and complete partial rows

1.) Apply local predicates and organize qualifying rows in join column sequence by either sorting or accessing via join column index….

PARTIAL ROWS

RESULT

INNER

OUTER

R

O

W

1

2

3

4

5

6

R

I

D

P1

P2

P3

P4

P5

P6

  • C1 C2
  • 1
  • 1
  • 2
  • 3
  • 6
  • .
  • C22 C33
  • D
  • C
  • E
  • B
  • A
  • G
  • C2 RID
  • P1
  • P1
  • P2
  • P3
  • P4
  • C2 C33
  • D
  • D
  • C
  • E
  • B

RID LIST

P1 P1 P2 P3 P4

3.) Create partial rows, and sort in RID sequence...….

2.) Obtain only inner table RIDs via index access using sequenced join column key values...….

an ounce of prevention
An Ounce of Prevention….,
  • Make your queries simple and efficient, ensuring the least costly access path available.
    • Try not to overload your tables with indexes
    • Try not to overload your indexes
    • Try not to overload your queries
  • Keep the Database healthy
    • Reorganization
      • Eliminates empty space, and fragmentation
      • Reduces I/O
  • Generate Statistics (if they are not automatic)
    • The Optimizer is very smart, but data attributes are always changing
      • DB Size/Volume, Data Skewness, Data Content
  • Analyze SQL Query and access path selection prior to implementing into a production environment.
    • Execute the Explain Plan periodically to determine what method the Optimizer is selecting for an access path.
explain plan showplan
“Explain” Plan / SHOWPLAN
  • Phase of the optimizer that captures information used in selecting the query access plan
  • Why use an Explain Plan?
    • Gives clues as to why the optimizer made access decisions
    • Can be used in advance of execution
    • Can be used to maintain a history of problem query access
      • Before/After new indexes additions
      • Before/After Statistics are Generated/Re-Generated
      • Before/After Data additions/changes/deletions
    • Problem determination is easier by comparing reference plans
example
Example
  • Graphical SHOWPLAN
monitor and tuning
Monitor and Tuning
  • A Constant Process
    • A very necessary part of successful database implementation
    • Must be there to guarantee ongoing, optimal Database Performance

Design Data

Object Data

Activity Data

Redesign

Tune

3.) Consider Fixes

1.) Collect Data

Real time

Periodic

Historical

Repeat

4.) Apply Fixes

2.) Analyze Data

monitoring and tuning1
Monitoring and Tuning
  • What to monitor
    • Healthiness of Database Objects
      • Growth
      • Fragmentation
        • Exists when TS and/or indexes have pages in which the logical ordering, based on key or link value, does not match the physical ordering of the pages inside the file
        • Causes additional I/O and additional storage
        • Causes of Fragmentation
          • DML (Insert, Delete, Update)
          • Inserts/Updates cause Page Splits
          • Delete/Updates cause holes
monitoring and tuning2
Monitoring and Tuning
  • What to monitor
    • Fragmentation illustrated
    • Reorganization
      • Reorders pages, compresses entries on a page
    • Always be sure to run new Statistics collection (for the Optimizer)

Uniform pages in order

Index 1

Page 8

Index 1

Page 1

Index 1

Page 7

Index 1

Page 2

Index 1

Page 4

Index 1

Page 3

Index 1

Page 6

Index 1

Page 5

Non-uniform pages, out of order

Index 1

Page 8

Index 1

Page 1

Index 1

Page 5

Index 3

Page 1

Index 1

Page 2

Index 2

Page 1

Index 2

Page 2

Index 1

Page 4

monitoring and tuning3
Monitoring and Tuning
  • What to monitor
    • Object Usage
      • Access Patterns (Random, Sequential, Indexed, Non-Indexed)
      • I/O (Volume, Latency)
      • They tend to change over time as users learn the application
    • Memory Usage
      • Buffer Hit Ratio
      • Data/Index pages in the Buffer will avoid an I/O
    • Processing Activity
      • CPU utilization
        • Will indicate excessive searching and/or sorting
      • Parallel, Non-Parallel
        • Can speed up large searches
        • Can also monopolize all the processors
    • Locking
      • Timeouts
      • Deadlocks
monitoring and tuning4

Performance

DB

Monitoring and Tuning
  • How to monitor – Tool usage

SQL Request

Tool to

Collect &

Interpret

Alerts

Statistical

Generation

DBMS

Reports

Result

monitoring and tuning5
Monitoring and Tuning

Steps

  • Find the statements that consume the most resources
    • “Heavy Hitters”
      • Physical Reads will indicate SQL requiring disk access to get queries
        • Most expensive part of a Query!!!
      • Buffer Gets indicate the amount of searching going on within a query

High Buffer Gets = Lots of Searching = Lots of Processing

      • Sorts information will indicate if SQL is doing an excessive amount of sorting
  • Find the offending statements without adding to the performance problem
    • Use simple top down approach
      • Avoid heavy tracing
      • Know the Database Design and Usage
      • Run Explain Plan on SQL