Resource mapping a wait time based methodology for database performance analysis
Download
1 / 40

Resource Mapping A Wait Time Based Methodology for Database Performance Analysis - PowerPoint PPT Presentation


  • 313 Views
  • Uploaded on

Resource Mapping A Wait Time Based Methodology for Database Performance Analysis. Prepared for NoCOUG, Fall Conference, 2004 Presented by Matt Larson Chief Technology Officer Confio Software. Presentation Agenda. Introduction Conventional Tuning vs. Wait-based Tuning

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Resource Mapping A Wait Time Based Methodology for Database Performance Analysis' - becca


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Resource mapping a wait time based methodology for database performance analysis l.jpg

Resource MappingA Wait Time Based Methodology for Database Performance Analysis

Prepared for NoCOUG, Fall Conference, 2004Presented by Matt LarsonChief Technology Officer Confio Software


Presentation agenda l.jpg
Presentation Agenda

  • Introduction

  • Conventional Tuning vs. Wait-based Tuning

  • Foundation: Resource Mapping Methodology

  • 5 Key Steps of Applying RMM

  • RMM Advantages

  • Conclusion


Who am i l.jpg
Who am I?

  • Former DBA consultant specializing in Oracle performance tuning

  • Co-author of three Oracle books (Oracle Development Unleashed, Oracle Unleashed

    2nd Edition, Oracle8 Server Unleashed)

  • Co-author of two other database related books

  • CTO and founder of Oracle performance software company



Conventional tuning l.jpg
Conventional Tuning Under the Streetlight

  • Art, not a science

  • Ratio-based (cache hit ratios, etc.)

  • Sometimes fruitless

  • It’s “tuned” (I guess?)

  • Different tuning/investigation process for each DBA/DBA Team/Company


Problems with conventional tuning tools l.jpg
Problems with Conventional Tuning Tools Under the Streetlight

  • Optimize systems, not business results

  • Conventional tools:

    • V$ Views: limited visibility & granularity

    • Statspack: averages across entire database

    • Tracing: Limited to session, excessive volume

  • Incorrect Data hides real results

    • System-wide averages

    • Event counters

    • Incomplete visibility


What problems are you trying to solve l.jpg
What Problems are you Trying to Solve? Under the Streetlight

  • Methodology addresses a common problem space:

    • I spend the whole week monitoring and optimizing Oracle configurations, but I have no demonstrable results to show for it - why?

    • Will more hardware make my application run faster? By how much?

    • Will the new application run efficiently on the production server?

    • Why does one application keep impacting my SLA compliance?

    • If I could make one (or 2, 3, or 4) changes to my database to have the biggest impact, what would they be?


You know you are working on the wrong thing when l.jpg
You know you are working on the wrong thing when… Under the Streetlight

  • After spending an agonizing week tuning Oracle buffers to minimize I/O operations, management typically rewards you with:

    • A. An all expense paid vacation

    • B. A free lunch

    • C. A stale donut

    • D. Reward? Nobody even noticed!


You know you have a visibility problem when l.jpg
You know you have a visibility problem when… Under the Streetlight

  • You measure database performance based on:

    • A. Increasing trends in user response time

    • B. Increasing system down time

    • C. Increasing help desk calls

    • D. Increasing decibel levels from irate users


Your role is sub essential to the business of your organization when l.jpg
Your role is sub-essential to the business of your organization when…

  • Your role in the rollout of a new customer facing application results in:

    • A. Keys to drive the CEO’s Porsche

    • B. Keys to use the executive restroom

    • C. A mop to use in the executive restroom

    • D. Your office has been moved to the restroom


You know you are accustomed to measuring the wrong thing when l.jpg
You know you are accustomed to measuring the wrong thing when…

  • You measure the commute time to work based on:

    • A. The time it takes to get there

    • B. Counting the times your wheels rotate

    • C. Monitoring your tachometer

    • D. The number of speeding tickets


Wait based performance tuning l.jpg
Wait-based Performance Tuning when…

  • Emerging best-practice for database tuning

  • Proponents include leading consultants, trainers and authors

  • Oracle is starting to build wait-based tuning tools into the database particularly in 10g

  • Tune by determining where processing time is spent


Oracle 10g moving towards wait based l.jpg
Oracle 10g - Moving towards wait-based when…

  • Adding wait-based columns to existing views

  • New wait-based views

    Example:

    v$session_wait_history

    • Provides the last 10 wait events for a session

    • Session ID, Username, Event, Wait_Time, etc.

    • Used to provide wait_time for only a few events


Resource mapping methodology l.jpg
Resource Mapping Methodology when…

A set of requirements that define what data must be captured to effectively make tuning decisions and a process for applying the data to achieve the optimal outcome

RMM

Wait Based Tuning


Dba success stories using rmm l.jpg
DBA Success Stories using RMM when…

  • DBA solves a “Cold Case”. Problem unresolved for 1 year with traditional tools; Solution identified in 10 minutes during hands-on training

  • DBA ends “Crit Sit” 2 week situation ends quickly after identification of Library Cache pin wait and load locks. Metalink identifies Oracle bug, patch successfully applied

  • DBA saves $700K. 90% CPU capacity initiates expansion from 12 to 24 CPU server. DBA identifies parallel queries across 16 parallel threads as source of bottleneck. CPU eliminated as constraint, no new server required.


Resource mapping methodology16 l.jpg
Resource Mapping Methodology when…

  • Three Key Principles of RMM

    1. SQL View: View all statistics atSQL statement level

    2. Time View: MeasureTime, not number of times a resource is utilized

    3. Full View: Separately measureevery resourceto isolate source of problems


Illustrating example sql view principle l.jpg
Illustrating example: SQL View Principle when…

  • Example: ‘CEO’ measuring ‘employee’ output

  • Averaging over entire company gives no useful data

  • Must measure each job separately

  • DBA must manage database similarly

  • Measure and identify bottlenecks for each SQL independently


Illustrating example time view principle l.jpg
Illustrating example: Time View Principle when…

  • Example: ‘CEO’ counting ‘tasks’ vs. ‘time to complete’

  • Counting system statistics not meaningful

  • Must measure Time to complete

  • System stats (buffer size, hit ratios, I/O counts) do not identify where database customers are waiting

  • Identify and optimize Wait Time for each SQL as best indicator of performance


Illustrating example full view principle l.jpg
Illustrating example: Full View Principle when…

  • Example: ‘CEO’ measuring results with blind spot hiding key processes

  • Without direct visibility, valuable info is lost

  • Must have visibility to every process step

  • Distinctly identify and measure each Oracle resource for each distinct SQL


Track sql time not system counters l.jpg
Track SQL Time, when…Not System Counters

5K Packets

216K Writes

30 Minutes

4 M

6 M

4 M

15M

200 Minutes

10 M

200 Minutes

5M

5M

100 Minutes

5M

  • Watching Counters leads to wrong conclusions: Time is more relevant

  • Total System Counters hide information: Need breakdown to individual SQLs

Total System Counter

80K Reads

125 Attempts

SQL 1

5 R

50 A

SQL 2

25 R

35 A

SQL 3

50 Reads

50 A

Resources

I/O

Network

Locks

Redo


Applying rmm for business results l.jpg
Applying RMM for Business Results when…

Five Step Process focusing on what matters

1. Identify

2. Allocate

3. Quantify

4. Prioritize

5. Assign


Step 1 identify l.jpg
Step 1: Identify when…

  • Identify SQL Statements having largest impact

    • (SQL View and Time View principles)

  • Longest wait times = most significant “pain points” for customers

  • Conversely, low cache hit ratios or high latch usage may not impose high wait times for users (so why fix them?)

SQL statements

prioritized by

Total Wait Time


Step 2 allocate l.jpg
Step 2: Allocate when…

  • Allocate impact to real customers (internal or external)

  • Allocate wait time to Program, Session, Machine

    • SQL View principle makes this connection

  • Understanding database customer and application

Programs Prioritized by

Total Wait Time


Step 3 quantify l.jpg
Step 3: Quantify when…

  • How much is save in time/money if fixed?

  • Enabled by Full View and Time View principles

  • Soft dollar savings

    • Data entry clerks

    • DBA time spent in problem resolution

  • Hard dollar savings

    • Reduce hardware upgrades

    • Meet SLA’s avoiding penality

    • Ensure business isn’t lost due to poor performing or unavailable system

Quantifiable benefit of

Tuning a

specific statement


Step 4 prioritize l.jpg
Step 4: Prioritize when…

  • If last step properly executed, this step is fairly straight forward

  • Allow’s DBA to cut through the clutter of potential new projects, investigations, and trials.

  • Better justification for priorities. (e.g. We aren’t working on your problem since this other has a higher demonstrable business impact)


Step 5 assign l.jpg
Step 5: Assign when…

  • Assign the right people to the problem

    • Log_buffer waits

    • Network issues

    • Same query 10,000/hour

  • Enabled by Full View principle

  • Avoid finger-pointing by accurately assigning quickly


Resource mapping methodology27 l.jpg
Resource Mapping Methodology when…

RMM

Wait Based Tuning

Network, Storage, Application, Web, etc.


Silo monitoring l.jpg
Silo Monitoring when…

Business Management

LIMITED VIEW

Software Layers

IT Management

LIMITED VIEW

Web Server

Web Team

Sitescope

Custom Biz Logic

Custom App Team

Often No Commercial Tools

Network

Network Team

HP Openview

Database Server

Database/OS Teams

Wait-based tuning

Storage Box

Storage/OS Teams

EMC Control Center

Each team uses their own tool to partially monitor their non-Oracle layers. No view across layers. Management has no clear view.


The solution integrated vision l.jpg
The Solution - Integrated Vision when…

Business Management

RMM across the stack

IT Management

Web Team

Web Server

Custom App Team

Custom Biz Logic

Network Team

Network

Database/OS Teams

Database Server

Storage/OS Teams

Storage Box

All teams see a complete picture of all layers and dependencies.

Enables more efficient “Umbrella” solution.



Example 1 problem observed l.jpg
Example 1: Problem Observed when…

  • Critical situation: Secure Service Center application performance unsatisfactory

    • Response time between 2400 and 9000 seconds

    • Very high network traffic (3x—4x normal), indicating time-outs and user refreshes

    • “CritSit” declared: major effort to resolve problem


Observations using resource mapping methods l.jpg
Observations using Resource Mapping Methods when…

  • 1: Identify accumulated Waits

  • 2: Identify specific resources used

Lib cache pin wait

Notice scale: > 8000 secs

Lib cache load lock


Results l.jpg
Results when…

Library cache pin nearly unobservable

Notice scale: < 1400 secs max

Library cache load lock no longer observable


Results34 l.jpg
Results when…

  • Response time improvement from 8000 seconds (worst case) to 900 seconds

  • Variance improvement:

    • Before: response time 2400 - 8000 sec

    • After: response time 800 - 900 sec


Example 2 performance drain identify the source l.jpg
Example 2: Performance Drain – Identify the Source when…

  • Slow response reported

  • DBA and database focus of delays

  • Database problem?

  • No – SQL*Net Message identified as source of delay

  • 2nd highest wait event


Rmm drill down identifies source of problem l.jpg
RMM Drill Down identifies source of problem when…

  • Single application generates all SQL*Net Messages

  • App on same server as Oracle!

  • Answer:

  • Misconfiguration – TCP/IP used within server

  • Change to IPC, eliminate NIC traffic and 30% of wait time

Solution requires knowing: Which SQL, What Wait Time, Which Resource


Example 3 scattered reads l.jpg
Example 3: Scattered Reads when…

  • Situation: LINS06 database - Hourly profile identifies high wait anomaly

  • 3-10x higher than other periods – requires investigation

wait time

42,000 seconds

10:00-11:00


Drill down to key rmm parameters l.jpg
Drill Down to Key RMM Parameters when…

Notice scale: > 6000 secs

Db file scattered reads

Db file scattered reads


Conclusion l.jpg
Conclusion when…

  • Look for what has an impact

  • Resource Mapping is more that Wait Time – Analysis must include:

    • SQL level granularity

    • Full Resource granularity

  • Isolating the SQL and Resource allows you to find and fix the Root Cause

  • DBAs can have an impact and be heroes!


Thank you for coming l.jpg
Thank you for coming when…

Matt Larson

Contact Information


ad