slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Oracle Wait Events That Everyone Should Know Kerry Osborne Senior Oracle Guy PowerPoint Presentation
Download Presentation
Oracle Wait Events That Everyone Should Know Kerry Osborne Senior Oracle Guy

Loading in 2 Seconds...

play fullscreen
1 / 53

Oracle Wait Events That Everyone Should Know Kerry Osborne Senior Oracle Guy - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Oracle Wait Events That Everyone Should Know Kerry Osborne Senior Oracle Guy. What Are Wait Events?. Basically a section of code Uses gettimeofday All time is stuck in a bucket with a name Multiple things can be stuck in the same bucket

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Oracle Wait Events That Everyone Should Know Kerry Osborne Senior Oracle Guy' - dawn-perry


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Oracle Wait Events

That Everyone Should Know

Kerry Osborne

Senior Oracle Guy

what are wait events
What Are Wait Events?
  • Basically a section of code
  • Uses gettimeofday
  • All time is stuck in a bucket with a name
  • Multiple things can be stuck in the same bucket
  • Oracle is very well instrumented (but It’s not 100%)
  • There are 41 classes of wait events in 10.2.0.3
  • There are 878 wait events in 10.2.0.3
    • 209 enqueue events
    • 29 latch events
    • 41 I/O events
what do you do with wait events
What do you do with Wait Events?
  • See where Oracle is spending it’s time
  • Create a Resource Profile
    • Aggregate the time by Wait Event
    • Order the profile by time
    • Include # of occurrences and avg. time/event
    • Shows where to focus
    • Shows how much improvement can be made
profiles
Profiles
  • Basic Tool – maybe the most important
  • Shows up in Statspack / AWR reports
  • Shows up in tkprof output (10g)
  • Can be pulled out of trace files
  • Can be pulled from V$ tables (v$system_event)
  • Can be pulled from ash tables

Oracle is very well instrumented

profiles1
Profiles

Top 5 Timed Events

~~~~~~~~~~~~~~~~~~ % Total

Event Waits Time (s) Ela Time

-------------------------------------------- ------------ ----------- --------

CPU time 37,297 52.27

SQL*Net message from dblink 7,065,802 11,312 15.85

async disk IO 943,852 5,265 7.38

db file sequential read 13,042,432 3,493 4.90

direct path write 108,862 3,410 4.78

-------------------------------------------------------------

why should developers know this stuff
Why should developers know this stuff?
  • Because you need to know where your code is spending it’s time
  • Scalability is Important
slide8

Top 10 List of Wait Events

  • 1. These are the events you should know by heart
  • 2. The other 260 or so you can figure out
  • 3. You need to know what causes them
  • 4. You need to know what values are reasonable
  • 5. You need to know what you can do to fix them

select name, parameter1, parameter2, parameter3

from v$event_name

where name like nvl('&event_name',name)

order by name

slide9

CPU

  • Not a wait event – but often included in a profile
  • Time spent but not accounted for in other buckets
  • Primarily time spent doing lio (we hope)
  • We want it to be on top
  • It may be an indicator of inefficient plans
slide10

DB File Sequential Read

  • Single block read
  • Usually index block or data block via rowid
  • Can be done by fts to get chained rows
  • P1=file, P2=block, P3=# of blocks (always 1)
  • Always a user’s server process reading into buffer cache
  • Reasonable e values: <10ms

WAIT #14: nam='db file sequential read' ela= 36745 file#=1 block#=67745 blocks=1 obj#=84920

slide11

DB File Sequential Read – “fixes”

  • Do Less Work
    • tune SQL
      • reduce lio’s 7 / obj rule of thumb
      • Explain plan lies (see notes on this slide)
    • bigger buffer cache
  • Do the Work Faster
    • Speed up I/O
  • Call Jack
slide12

Digression – Explain Plan Lies

  • Explain plan appears to executes a separate code path
  • Even if it didn’t there is no guarantee TEST matches PROD
  • Best bet is to execute and look at V$SQL_PLAN
  • This is easy in 10g with dbms_xplan

select * from table(dbms_xplan.display_cursor('&sql_id','&child_no',''))

  • See an example in the notes section for this slide
slide13

DB File Scattered Read

  • Multi block read
  • Usually full table scan or index fast full scan
  • P1=file, P2=block, P3=# of blocks (always < MBRC)
  • Always a user’s server process reading into buffer cache
  • Reasonable ela values: <10ms
  • Relevant parameters:
    • DBFMBRC, System Stats: MBRC,
    • _db_file_exec_read_count, _db_file_optimizer_read_count

WAIT #14: nam='db file scattered read' ela= 19389 file#=139 block#=44 blocks=5 obj#=83580

slide14

DB File Scattered Read – “fixes”

  • Do Less Work (fewer multi-block reads)
    • MBRC
    • Better Indexes
    • System Stats
    • bigger buffer cache
    • Tune SQL
  • Do the Work Faster
    • Speed up I/O
slide15

Direct Path Read/Write

  • Usually sorting to Temp
  • Can also be parallel query
  • Could also be insert append, etc…
  • Reasonable ela values: <20ms
  • Relevant parameters:
    • DBFMBRC
    • _db_file_direct_io_count (1M)

WAIT #10: nam='direct path write' ela= 4475 p1=401 p2=1518353 p3=57

WAIT #10: nam='direct path read' ela= 16770 p1=401 p2=1482031 p3=63

slide16

Direct Path Read/Write – “fixes”

  • Adjust PGA_AGGREGATE_TARGET
  • Turn off PX query
  • Call Randy
slide17

Log File Sync

  • User process wait for LGWR to flush dirty buffers on commit
  • Synchronous write event
  • P1=log buffer block
  • Reasonable ela values: <4ms

WAIT #16: nam='log file sync' ela= 1286 buffer#=11910 p2=0 p3=0 obj#=84920

slide18

Log File Sync – “fixes”

  • Do Less Work (fewer commits)
    • Autocommit?
    • Row at a time processing with commits?
  • Do the Work Faster
    • Speed Up I/O
      • No RAID 5
      • No Contention (other db, arch, ASM)
      • SS Disk
    • Improve LGWR Scheduling
      • Renice
slide19

Log File Parallel Write

  • Wait for LGWR to write to current log files
  • Not necessarily synchronous write event
  • System I/O event
  • Reasonable ela values: <4ms
  • Occurs as follows:
    • Every 3 seconds
    • Log_buffer is 1/3 full or redo = 1M ( default _log_io_size)
    • Commit or rollback by user session
    • RAC needs to transfer a dirty block
slide20

Log File Parallel Write – “fixes”

  • Often fixed by fixing Log File Sync (i.e. speed up i/o, etc)
  • Force LGWR to write more often
slide21

Log file switch …

  • Happens when database can’t switch to next log file
  • Freezes the whole database
  • Several Flavors
    • Checkpoint incomplete
    • Archiving needed
  • Reasonable ela values: 0ms

WAIT #9: nam='log file switch (checkpoint incomplete)' ela= 986212 p1=0 p2=0 p3=0

slide22

Log file switch … – “fixes”

  • Do less work
  • Make total online log space bigger
  • Force more frequent incremental checkpoints
  • Speed up i/o
slide23

Buffer Busy Waits

  • Contention event for the same block
    • Read by other session (new event in 10g)
    • Held by other session in incompatible mode
  • New event in 10g – read by other session
  • Multiple sections of code throw time in this bucket
  • P1=file, P2=block, P3=reason code
  • See metalink note (34405.1) for details

WAIT #7: nam='read by other session' ela= 3251 file#=4 block#=188557 class#=1 obj#=53707 tim=1183096831514887

slide24

Buffer Busy Waits - “fixes”

  • It’s complicated – but basically - eliminate the contention
  • For Read by Other Session
    • Do Less Work (tune SQL to do less lio)
    • Eliminate i/o – bigger buffer cache, etc…
    • Speed up i/o
  • For Others
    • Find type of hot objects and statements suffering
    • Address issue - ASSM, Freelist groups (RAC), etc..
slide25

Free Buffer Waits

  • No place to put a new block
slide26

Free Buffer Waits - “fixes”

  • Do Less Work
    • Reduce the amount of lio
  • Make more buffer space available
    • Add memory to the buffer cache
    • Speed up DBWR to flush blocks more quickly
      • Async_io
      • More dbwr processes
      • Renice
slide27

SQL*Net message from client

  • Database is idle – waiting on next request from client
  • Often marked as an idle event which it may be
  • Time distribution can be skewed
  • Common technique to ignore anything over 1 sec
  • Reasonable e values: <2ms
  • Protocol should be tcp (or beq for local processes)

WAIT #1: nam='SQL*Net message from client' ela= 336 driver id=1650815232 #bytes=1 p3=0 obj#=83660

slide28

SQL*Net message from client – “fixes”

  • Speed up app server
  • Reduce the number of calls
  • Use the right protocol (tcp to beq)
  • Call network guy
  • Get the users to type faster
slide29

SQL*Net more data from client

  • More than 1 packet required to send SQL statement
slide30

SQL*Net more data from client – “fixes”

  • Don’t send 50K SQL statements
  • Use packages
slide31

SQL*Net message to Client

  • Time to send a SQL*Net message to a client (sort of)
  • Actually time it took to write the return data from Oracle’s userland SDU buffer to OS kernel-land TCP socket buffer (Tanel Poder)
  • Often marked as an idle event which it may be
  • Reasonable e values: <1ms (micro seconds actually)
  • P1=bytes (usually 1 ???)

WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660

slide32

SQL*Net message to Client – “fixes”

  • Use correct protocol
  • Fix network issue
  • Call Andy
slide33

SQL*Net more data to client

  • When more than one packet required
    • Rows spanning block
    • LOB’s
    • Array fetches
  • Often marked as an idle event which it may be
  • Reasonable e values: <1ms
  • P1=bytes (usually 1 ???)

WAIT #1: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=83660

slide34

SQL*Net more data to client – “fixes”

  • Fix chained/migrated rows
  • Don’t use LOBs
  • Nothing to fix
slide35

Enqueues

  • Locks for objects
    • Que up in order
  • 9i lumped them all together
  • 10g has 209 separate events
slide36

Enqueue in 9i

  • P1 has encoded type and mode
slide37

Enqueue in 10g

  • Several Major Categories
    • TM – table modification
    • TX – Transaction locks
    • UL – user lock
    • CI – Cross Instance
    • CU – Cursor Bind
    • HW – High Water
    • RO – Reusable Object
    • ST – Space Transaction
    • TS – Temporary Space
  • Parameters depend on type
slide38

Enqueue: TX row lock

  • User can’t modify block until other user commits
slide39

Enqueue: TX row lock – “fixes”

  • Use ash tables to find statements waiting
  • Fix the code
  • Or do what one of our clients did:

write some code to kill any processes blocking for more than 60 seconds – lot’s easier than changing the app

slide40

Enqueue: SQ

  • Waiting on exclusive access to a sequence
  • Seems minor but can actually be a big issue on some systems
slide41

Enqueue: SQ – “fixes”

  • Cache the sequence
  • Use NOORDER in RAC
slide42

Latches

  • Locks internal memory structures
    • Not ordered
    • Very light weight
    • Kid Asking for Candy Model
    • Spins vs. Sleeps (_spin_count, _latch_class_X)
    • Wait events do not include time spent spinning
  • 9i lumped them all together
  • 10g has 28 separate events
slide43

Latches in 9i

  • All dumped in the Latch Free event
  • P2 has latch#
slide44

Latches in 10g

  • Primarily Deal with major SGA areas
    • Shared Pool
      • Latch: shared pool
      • Latch: library cache
      • Latch row cache
    • Buffer Cache
      • Latch: cache buffer chains
      • Latch: cache buffers lru chain
    • Log Buffer
      • Latch: redo …
  • Parameters depend on type (see v$event_name)
slide45

Latch: cache buffers chains

  • Each block hashes to a chain
  • Several chains per latch
  • Several latches per instance (default based on cache size)
slide46

Latch: cache buffers chains –”fixes”

  • Find hot blocks
  • Fix bad SQL
  • Maybe Increase number of latches
  • Maybe decrease rows per block
slide47

Latch: shared pool

  • Shared Pool Contention
    • Due primarily to parsing
    • See also latch: library cache …

WAIT #28: nam='latch: shared pool' ela= 751 address=1611562656 number=213 tries=1 j#=46024

slide48

Latch: shared pool – “fixes”

  • Use Bind Variables
  • CURSOR_SHARING = FORCE
  • Increase SHARED_POOL

– this is where Darcy came in and sat in my office so

I didn’t get much more done –

You can thank Darcy later!

slide49

Unaccounted for

  • Some tools show this (Hotsos Profiler)
  • Indicates lost time due to rounding (small)
  • Or lost time due to CPU contention (large)
tracing
Tracing
  • alter session set events '10046 trace name context forever, level 8';
    • Spits out wait events, plans (real), stats
    • Level 12 includes bind variables – makes file very big
    • Puts file in user_dump_dest (ls –altr)
    • Can be executed from logon trigger, in-line, etc…
    • Scoping is important (i.e. when it’s turned on and off)
    • Note: bug in early versions of 9.2, fixed in 9.2.0.6
trace file contents
Trace File Contents
  • WAIT
  • FETCH
  • EXEC
  • PARSE
  • STAT
  • XCTEND
  • PARSING IN CURSOR
raw trace file
Raw Trace File
  • /opt/oracle/admin/LAB102/udump/lab102_ora_4207.trc
  • Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
  • ORACLE_HOME = /opt/oracle/product/db/10.2.0/db1
  • System name: Linux
  • Node name: homer
  • Release: 2.6.9-34.ELhugemem
  • Version: #1 SMP Fri Feb 24 17:04:34 EST 2006
  • Machine: i686
  • Instance name: LAB102
  • Redo thread mounted by this instance: 1
  • Oracle process number: 20
  • Unix process pid: 4207, image: oracle@homer (TNS V1-V3)
  • *** ACTION NAME:() 2007-08-16 13:48:14.571
  • *** MODULE NAME:(SQL*Plus) 2007-08-16 13:48:14.571
  • *** SERVICE NAME:(SYS$USERS) 2007-08-16 13:48:14.571
  • *** SESSION ID:(143.189) 2007-08-16 13:48:14.571
  • =====================
  • PARSING IN CURSOR #7 len=68 dep=0 uid=61 oct=42 lid=61 tim=1159462982979740 hv=740818757 ad='30663e48'
  • alter session set events '10046 trace name context forever, level 8'
  • END OF STMT
  • EXEC #7:c=0,e=269,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1159462982979717
  • WAIT #7: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1159462982980778
  • WAIT #7: nam='SQL*Net message from client' ela= 119 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1159462982981008
  • =====================
  • PARSING IN CURSOR #8 len=44 dep=0 uid=61 oct=3 lid=61 tim=1159463023994427 hv=761757617 ad='54738434'
  • select avg(col1) from skew
  • where rownum < 10
  • END OF STMT
  • PARSE #8:c=4000,e=3904,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,tim=1159463023994411
  • EXEC #8:c=0,e=185,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=1159463023994844
  • WAIT #8: nam='SQL*Net message to client' ela= 9 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023994980
  • WAIT #8: nam='db file scattered read' ela= 218 file#=4 block#=21900 blocks=5 obj#=53707 tim=1159463023995544
  • FETCH #8:c=0,e=665,p=5,cr=4,cu=0,mis=0,r=1,dep=0,og=1,tim=1159463023995732
  • WAIT #8: nam='SQL*Net message from client' ela= 157 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996048
  • FETCH #8:c=0,e=7,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=0,tim=1159463023996183
  • WAIT #8: nam='SQL*Net message to client' ela= 7 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996312
  • WAIT #8: nam='SQL*Net message from client' ela= 298 driver id=1650815232 #bytes=1 p3=0 obj#=53707 tim=1159463023996669
  • XCTEND rlbk=0, rd_only=1
  • STAT #8 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=4 pr=5 pw=0 time=736 us)'
  • STAT #8 id=2 cnt=9 pid=1 pos=1 obj=0 op='COUNT STOPKEY (cr=4 pr=5 pw=0 time=846 us)'
  • STAT #8 id=3 cnt=9 pid=2 pos=1 obj=53707 op='TABLE ACCESS FULL SKEW (cr=4 pr=5 pw=0 time=639 us)'
  • WAIT #0: nam='log file sync' ela= 680 buffer#=4862 p2=0 p3=0 obj#=53707 tim=1159463039852003
questions
Questions?

Kerry.Osborne@enkitec.com