诊断工具，等待事件，SQL优化原则

诊断工具，等待事件，SQL优化原则

Statspack的使用 • STATSPACK的安装 • SQL> @?/rdbms/admin/spcreate.sql; • 脚本将自动创建PERFSTAT用户，并提示保存STATSPACK统计信息的表空间以及临时表空间 • 性能快照的采集 • SQL>exec statspack.snap; • 统计报告的生成 • SQL> @?/rdbms/admin/spreport.sql; • 性能快照的自动采集 • SQL> @?/rdbms/admin/spauto.sql;

安装Statspack的准备 • 单独表空间（>90M) • 设置参数timed_statistics=true • 在init<sid>.ora中设置该参数，并让其生效 • SQL>alter system set timed_statistics=true;

Cache size Load profile Instance efficiency Percentages Top timed event Top SQL Instance Activity statistic tablespace IO stats for DB file IO stats for DB buffer pool statistics for DB PGA Aggr Target Stats for DB Statspack报告—构成内容 • Enqueue activity for DB • rollback segment stats for DB • rollback segment storage for DB • Undo Segment Stats for DB • latch activity for DB • dictionary Cache stats for DB • SGA memory summary for DB • init.ora parameters for DB

Statspack报告—分析重点 • Top 5 timed events • 数据库主要都在干什么 • Instance efficiency • 内存命中率 • Top SQL • 80/20原则

DB Name DB Id Instance Inst Num Release Cluster Host ------- -------- -------- -------- --------- ------- ------- HAW1 39997887 haw1 1 9.2.0.1.0 NO HAWKING Snap Id Snap Time Sessions Curs/Sess ------- ------------------ -------- --------- Begin Snap: 32 24-Oct-02 16:45:54 10 5.2 End Snap: 33 24-Oct-02 16:46:32 10 5.2 Elapsed: 0.63 (mins) Cache Sizes (end) ~~~~~~~~~~~~~~~~~ Buffer Cache: 36M Std Block Size: 8K Shared Pool Size: 12M Log Buffer: 512K 生成报告 — Context/Cache Sizes

生成报告 — Load Profile Per Second Per Transaction Redo size: 77,138.42 2,931,260.00 Logical reads: 765.50 29,089.00 Block changes: 565.58 21,492.00 Physical reads: 6.39 243.00 Physical writes: 11.76 447.00 User calls: 0.11 4.00 Parses: 280.47 10,658.00 Hard parses: 266.45 10,125.00 Sorts: 15.29 581.00 Logons: 0.00 0.00 Executes: 293.29 11,145.00 Transactions: 0.03 % Blocks changed per Read: 73.88 Recursive Call %: 99.99 Rollback per trans %: 0.00 Rows per Sort: 18.96

根据v$sysstat 得到的instance负载 与之前的报告进行比较，预先发现问题- IO是否显著增加了? 随软硬件配置不同而变化，无固定上限– 粗略规范为 Logical reads > 10,000 per 100MHz CPU per second Physical reads > 100 per disk per second Hard parses, soft parses > 100, 300 per second 着重检查parse (可调整 cursor_sharing and session_cached_cursors参数) 和 IO 生成报告 — Load Profile

Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %:100.00Redo NoWait %:100.00 Buffer Hit %: 99.98 In-memory Sort %: 99.48 Library Hit %: 76.14 Soft Parse %: 5.00 Execute to Parse %: 4.37 Latch Hit %:100.00 Parse CPU to Parse Elapsd %: 97.73 % Non-Parse CPU: 23.35 Underlined items have good corresponding wait events Shared Pool Statistics Begin End ----- ----- Memory Usage %: 94.08 93.54 % SQL with executions>1: 76.37 54.90 % Memory for SQL w/exec>1: 62.10 61.01 生成报告 — Instance Efficiency

如果统计收集时间过短，或者数据库重启，可能得到错误的命中率如果统计收集时间过短，或者数据库重启，可能得到错误的命中率粗略的命中率范围 90-100% Buffer/redo nowaits, Latch, Sorts 50-100% Library Cache 0-100% Parse, Buffer Hit 结合命中率和等待事件进行分析 Shared pool usage 应稳定在 80-90% 如果 >90% 则检查绑定变量和 reloads 生成报告 — Instance Efficiency

数值大(>95%)表示数据缓冲区足够大 否则需要增加数据缓冲区的大小，或调整数据文件IO的速度命中率 Buffer Nowait

数据大(>99%)表示log_buffer足够 否则调整log_buffer的大小命中率 Redo Nowait

足够大(>95%)表示命中率较高 否则可以调整数据缓冲区的大小命中率 Buffer Hit

数字大(>95%)表示大部分数据在内存中进行排序 否则调整sort_area_size的值，或pga_aggregate_target的值命中率 In-memory Sort

数据大(>95%)表示SQL的命中率较高 否则需要调整shared_pool_size的值，或者需要调整SQL，使用bind variable 命中率 Library Hit

数字大(>95%)表示SQL的缓存情况比较好 否则需要增加shared_pool_size的值命中率 Soft Parse

=100 * (1 - Parses/Executions) 数值高表示重新执行的次数较多，在Materialized View较多的系统中值可能很低，因为完全刷新后很多SQL语句执行时需要重新解释,动态的创建表或索引等也会增加Parse的次数. 命中率 Execute to parse

数值大(>99%)表示内部锁比较好 否则需要查询Oracle有关版本的bug说明命中率 Latch Hit

做SQL解释时CPU的利用率 数据越大表示解释的效率高命中率 Parse CPU to Parse elapsed

表示用于SQL 执行的CPU的比重 数值越大越好命中率 Non-Parse CPU

生成报告 — Top 5 Timed Events 报告最有价值的部分 9.2 在wait event基础上，增加了 ‘CPU Time’ Top 5 Timed Events ~~~~~~~~~~~~~~~~~~ % Total Event Waits Time (s) Ela Time ------------------------------ ----- -------- -------- CPU time 30 91.43 direct path read 95 1 3.53 control file sequential read 54 1 2.33 log file parallel write 62 0 .95 db file parallel write 20 0 .68 ---------------------------------------------------------

生成报告 — Wait Events Oracle提供的重要诊断工具每个等待的解释参考联机手册常见的 I/O waits: Db file sequential read – Index reads or scans Db file scattered read – Full table scans Direct path read/write – Temp IO Log related waits - IO, switches, buffer

Db file * read ->SQL by buffer gets/disk reads, File IO stats CPU Time -> Parse rates, Sorts, SQL executions, SQL buffer gets/disk reads, SMP processes(bugs) Direct path reads/writes -> Sorts, Hash joins, hash/sort_area_size, File IO Stats Buffer busy waits -> Buffer pool, Buffer waits, File IO stats, Segment statistics 重要的内部 wait events (e.g. latches, enqueues) 有对应的详细部分生成报告 — 常见Wait Events

读取大量的数据块到Cache中 调整思路：调整大表上的索引收集更新统计信息事件 DB File Scattered Read

一般指读取索引的数据 调整思路：调整索引的设计Rebuild索引，提高索引效率事件 DB File Sequential Read

磁盘上进行的排序，并行的全表扫描 调整思路：增加SORT_AREA_SIZE或 PGA_AGGREGATE_TARGET 事件 Direct Path Read/Write

访问的块正在读取中其他他进程正在将数据读到Cache中访问的块正在读取中其他他进程正在将数据读到Cache中访问的块正在修改中其他进程正在修改Cache中的数据调整思路：将数据文件放在读取速度更快的设备上减少数据争用事件 Buffer Busy Wait

等待Checkpoint操作结束 调整思路：减小log buffer的大小增加Checkpoint的频率将log文件放在更快的磁带设备上,如RAID 事件 Checkpoint Completed

Enqueue是Oracle内部的一种锁，用来进行串行操作Enqueue是Oracle内部的一种锁，用来进行串行操作调整思路：使用本地管理表空间外键建索引及时commit 事件 Enqueue

寻找可用Cache块，如大量的数据被修改，或没有可用的空闲块寻找可用Cache块，如大量的数据被修改，或没有可用的空闲块调整思路增加db_cache_size的值检查DBWR效率事件 Free Buffer Wait

生成日志的速度大于将日志写到磁盘的速度 调整思路增加log_buffer的值将log文件放到空闲的磁盘设备上事件 Log Buffer Space

Archiving needed 等待归档完成调整思路：增加log_archive_processes的数量 Checkpoint not completed 等待切换到下一个日志调整思路：增加日志组的数量，调整大小 Completion 等待日志切换完成调整思路：将log放到更快的磁盘设备上事件 Log File Switch (…)

在用户commit/rollback时，等待将Log buffer写入日志文件的过程原因：LGWR效率低下，系统提交过于频繁调整思路将日志放到更快的磁盘设备上一次提交更多纪录适当使用NOLOGGING/UNRECOVERABLE等选项事件 Log File Sync

根据buffer gets, disk reads, executions, parse counts 排序得到四类 “问题SQL” SQL ordered by Gets for DB: HAW1 Instance: haw1 Snaps: 117 -118 CPU Elapsd Buffer Gets Execs Gets per Exec %Total Time(s) Time(s) Hash Value ----------- ----- ------------- ------ ------- ------- ---------- 13,192 1 13,192 74.2 1.83 8.76 3097336866 Module: SQL*Plus SELECT * FROM policies WHERE policy_type = :b1 生成报告 — SQL Section

生成报告 — SQL Section 非最优SQL通常是数据库问题的主要根源 “如何利用更少的资源得到相同的结果?” 根据 ‘number per execution’ 寻找问题SQL

9.2新增view v$segstat Top 5 Logical Reads per Segment for DB -> End Segment Logical Reads Threshold: 10000 Obj. Logical Owner Tablespace Object Name Type Reads %Total ----- ---------- --------------------- ----- ------- ------ TB TAB1 ANALYSIS_COMMON_RESU TABLE 106,416 24.35 TB TAB1 ANALYSIS_TESTS TABLE 103,744 23.74 TB TAB1 SAMPLES TABLE 40,736 9.32 TB IND1 SAMPLES_UK1 INDEX 18,688 4.28 TB TAB1 ANALYSIS_RESULTS_PK INDEX 18,032 4.13 ------------------------------------------------------------- 生成报告 — Segment Statistics

Tablespace IO Stats for DB: Instance: PAYROLL ->ordered by IOs (Reads + Writes) desc Tablespace Filename ---------- ------------------------------------------------- Av Av Av Av Buffer Av Buf Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms) ------- ------- ------ ------- ------ -------- ------ ------ PAY_6 /u01/oradata/payroll/PAY_6_1.dbf 438,860 638 4.8 7.4 10 0 5,750 9.7 生成报告 — Tablespace and Datafile IO

Buffer Pool Statistics for DB: NETMON Instance: netmon -> Pools D: default pool, K: keep pool, R: recycle pool Free Write Buffer Buffer Consistent Physical Physical Buffer Complete Busy P Gets Gets Reads Writes Waits Waits Waits - --------- ---------- --------- -------- ------ -------- -------- D 4,859,734 4,765,667 4,755,716 1,740 0 4 8,333 ------------------------------------------------------------------ Buffer wait Statistics for DB: NETMON Instance: netmon -> ordered by wait time desc, waits desc Tot Wait Avg Class Waits Time (cs) Time (cs) ------------ ----- --------- --------- data block 8,375 8,000 1 undo block 4 1 0 -------------------------------------- 生成报告 — Buffer Pool and Buffer Waits

9i 分别计算每个pool的命中率 如果free buffer waits or write buffer waits 很高，表示db writer 速度跟不上buffer pool 需求 Busy buffer waits 表示多个进程竞争某个数据块 (解决方案 reverse key indexes, fewer rows per block, freelists, initrans, more rollbacks, etc.) 生成报告 — Buffer Pool and Buffer Waits

Latch Activity for DB: Pct Avg Pct Get Get Slps NoWait NoWait Latch Name Requests Miss /Miss Requests Miss ----------------------- --------- ---- ----- --------- ------ cache buffers lru chain 4,925,313 4.3 0.2 4,749,919 4.4 ------------------------------------------------------------- Latch Sleep breakdown for DB -> ordered by misses desc Get Spin & Latch Name Requests Misses Sleeps Sleeps 1->4 ----------------------- --------- ------- ------ ------------ cache buffers lru chain 4,925,313 211,245 35,178 179031/29608 /2337/269/0 ------------------------------------------------------------- 生成报告 — Latches

Reloads 表示代码移出内存，重复解析。如果已经使用了bind variables，考虑增加 shared_pool大小, keep objects Library Cache Activity for DB: PROD Instance: PROD ->"Pct Misses" should be very low Get Pct Pin Pct Invali- Namespace Requests Miss Requests Miss Reloads dations --------------- --------- ---- --------- ---- ------- ------- BODY 1,074 0.1 559 92.8 518 0 CLUSTER 2,736 0.0 4,056 0.0 0 0 PIPE 0 0 0 0 SQL AREA 1,146,358 84.0 3,434,570 56.4 14,339 0 TABLE/PROCEDURE 1,988,138 0.0 4,940,442 0.9 27,943 0 TRIGGER 0 0 0 0 生成报告 — Library Cache

SQL语句的实现机制 Oracle 如何处理 SQL语句 • Stage 1: Create a Cursor • Stage 2: Parse the Statement (Most expensive step) • Stage 3: Bind Any Variables • Stage 4: Run the Statement (Most expensive step) • Stage 5: Fetch Rows of a Query (can be expensive) • Stage 6: Close the Cursor

RBO和CBO • 优化器决定SQL语句的执行路径，对性能至关重要。 • 基于规则的优化器 RBO (Rule Based Optimizer) • No change since oracle 7.3,desupported in oracle 10 • Stable, generate the execution plan according to fixed rules • Not necessary good • Do not support new oracle features(like IOT, partition, materialized view,…) • 基于成本的优化器 CBO (Cost based optimizer) • Since oracle 7.3 • Support all oracle features • Better than RBO in most case, Can be bad in some case • Choose the best execution plan according to internal algorithm • Need to analyze schema to get the best execution plan

统计信息的采集 • SQL> dbms_stats.gather_schema_stats(ownname=>‘CTAIS',estimate_percent=>20,cascade=>true,degree=>2); • ownname：为用户名 • estimate_percent：为采样频率 • cascade：表示表和索引的统计信息同时采集 • degree：并行度 • SQL> analyze table <table_name> compute statistics;

索引不起作用的情况 存在数据类型隐形转换列上有数学运算使用不等于（<>）运算使用substr字符串函数 % 通配符在第一个字符字符串连接（||）索引的使用（1）

删除不使用的索引 定期重建大量删除操作的索引 SQL>alter index <index_name> REBUILD [online]; 索引的使用（2）

提高查询语句的效率，减慢了DML语句的速度 在全表扫描和索引之间权衡在哪些列建立索引 Where字句中引用的列 Join中引用的列在子表的FK上建立索引防止对父表操作时锁住子表在哪些列上不要建立索引经常有DML操作排它性小索引设计原则（1）

不建议表上的索引超过5个 能够用组合索引则尽量用组合索引组合索引的靠左原则对于大表，尽量利用local的分区索引索引设计原则（2）

建议使用CBO 对于任何SQL语句一定要考虑其性能通过查看执行计划来比较SQL语句的优劣，防止经验主义可以利用10g的新功能调整SQL(在Oracle的4个Pack中) SQL Tuning Advisor SQL Access Advisor 可以利用HINT，SQL profile和query rewrite等功能 SQL编写原则（1）

Where条件中等号左边不要引用函数 Where条件中对于不同的数据类型要做显示数据转换 NOT,!=,<>,||不会用索引 not exists好于not in Exists好于in 用case代替DECODE 用UNION ALL代替UNION 用UNION代替OR SQL编写原则（2）

诊断工具，等待事件，SQL优化原则

诊断工具，等待事件，SQL优化原则

Presentation Transcript