- 90 Views
- Uploaded on
- Presentation posted in: General

Are Your Statistics Bad Enough?

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Verify the effectiveness of gathering

optimizer statistics

Jaromir D.B. Nemec

UKOUG 2005

27.10.2005

Are Your Statistics Bad Enough?

- The way from ANALYZE TABLE to DBMS_STATS
- Increasing number of parameters and options
- changing defaults
- CBO more intelligent
- a “small change” can have a big effect
- . . . and all that stuff only to get
- some redundant information

Are Your Statistics Bad Enough?

In the chain of dependent objects the quantity is increasing

Are Your Statistics Bad Enough?

Way 1 - observe (they are not if you encounter problems)

This is reactive approach

Way 2 - test it (define test cases and check them)

define invariant

the invariant must respect the dynamic of the

database!

Are Your Statistics Bad Enough?

- Motto: fix the problem in its origin
- Limit the scope to the basic table access only
- (full scan, index access)
- This reduces the complexity significantly and
- can expose the real origin of problems in the execution plan

Are Your Statistics Bad Enough?

select * from ta, tb

where ta.rand = tb.rand and

ta.norm = 141 and ta.rand <= 770 and

(tb.norm = 1 or tb.norm = -1)

---------------------------------------------------------

| Id | Operation | Name | Rows |

---------------------------------------------------------

| 0 | SELECT STATEMENT | | 65073 |

|* 1 | HASH JOIN | | 65073 |

|* 2 | TABLE ACCESS BY INDEX ROWID| TA | 2771 |

|* 3 | INDEX RANGE SCAN | TA_INDEX | 3598 |

|* 4 | TABLE ACCESS FULL | TB | 23580 |

---------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - access("TA"."RAND"="TB"."RAND")

2 - filter("TA"."RAND"<=770)

3 - access("TA"."NORM"=141)

4 - filter(("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770)

Are Your Statistics Bad Enough?

- For each table access
- Get owner, object name and alias
- Option FULL – get filter predicate
- Option (b-tree) INDEX – get access and filter predicate on table and index line
- enclose in brackets and connect with AND
- form a re-estimation query
- execute the query

Are Your Statistics Bad Enough?

Input from execution plan

|* 2 | TABLE ACCESS BY INDEX ROWID| TA | 2771 |

|* 3 | INDEX RANGE SCAN | TA_INDEX | 3598 |

|* 4 | TABLE ACCESS FULL | TB | 23580 |

---------------------------------------------------------

2 - filter("TA"."RAND"<=770)

3 - access("TA"."NORM"=141)

4 - filter(("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770)

Generated re-estimation query

select count(*) from "TB" where (("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770)

; -- gives 60317 rows

select count(*) from "TA" where ("TA"."NORM"=141) and ("TA"."RAND"<=770)

; -- gives 1 row

Run the re-estimation query to get the real cardinality, compare it with the original estimation

Are Your Statistics Bad Enough?

Run the re-estimation query to get the real cardinality, compare it with the original estimation

Note, that the TA access cardinality is overestimated

The TB access cardinality is underestimated

Are Your Statistics Bad Enough?

- The simplification is based on the assumption that all problems in the execution plan are caused directly or indirectly by the cardinality assumption in the base table access.
- Examples
- Cardinality underestimation can lead to NL join with big inner table
- Cardinality overestimation can switch a NL join to a SORT MERGE join
- improper assumption of cardinality 1 can lead to CARTESIAN join

Are Your Statistics Bad Enough?

Access Predicate – controls the index access

Filter Predicate – additional filter condition of index of table

Availability of predicates

Are Your Statistics Bad Enough?

- By selecting a limited set of objects and corresponding predicates we can try to find an answer to some fundamental questions:
- are my statistics precise enough?
- are my statistics up-to-date?
- does a histogram help?
- is the sample size OK?
- is dynamic sampling relevant?
- is the partition design suited for the application?
- ready for migration?

Are Your Statistics Bad Enough?

Comparing the estimated and re-estimated cardinality we obtain the relative and absolute difference

Predicate - where (("TB"."NORM"=(-1) OR "TB"."NORM"=1) AND "TB"."RAND"<=770)

ACCE EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF---- ---------- ----------- ---------- ----------same 23580 60317 36737 61Predicate - where ("TA"."NORM"=141) and ("TA"."RAND"<=770)

ACCE EST_CARD RE_EST_CARD ABS_DIFF REL_DIFF---- ---------- ----------- ---------- ----------same 2771 1 -2770 -277000

Are Your Statistics Bad Enough?

By observing the historical view we obtain the dynamic aspect of predicate selectivity

Are Your Statistics Bad Enough?

Two basic steps are extracting and processing of the predicate information for execution plans

example

operative

task

administrative

task

Are Your Statistics Bad Enough?

Are Your Statistics Bad Enough?

- Histogram candidates:
- column is used in predicate
- bad estimation
- access with different values
- real cardinality is in wide range

Test the influence of histogram size on the precision of selectivity estimation using re-estimation queries.

Are Your Statistics Bad Enough?

Target precision

Without a histogram (SIZE=1) the cardinality is underestimated (average).

If the histogram size is higher than the number of distinct values the estimation is very precise.

For columns with a high number of distinct values the estimation remains imprecise.

Are Your Statistics Bad Enough?

- Re-estimate a predicate on statistics gathered with different estimate_percent parameter
- observe the influence of the sample size
- Possible results:
- relatively small percentage returns good results
- when increasing estimate percent the error is reduced
- there is marginal effect of the sample size on the precision
- Q: is there one optimal single sample size for an object?

Are Your Statistics Bad Enough?

- add a dynamic sampling hint to the re-estimation query
- trace with 10053 event
- select /*+ dynamic_sampling(tc,2) */ *
- from tc where norm300 = 0;
- Generated DS query
- SELECT /* some hints */ NVL(SUM(C1),0), NVL(SUM(C2),0) FROM
- (SELECT /*+ FULL("TC") some other hints */ 1 AS C1,
- CASE WHEN "TC"."NORM300"=0 THEN 1 ELSE 0 END AS C2
- FROM "TC" SAMPLE BLOCK (1.076923 , 1) SEED (1) "TC"
- ) SAMPLESUB

Total rows in sample

Sample size

controlled by level

rows satisfying the predicate

Are Your Statistics Bad Enough?

Results of DS query

level : 2

sample pct. : 50.806452

actual sample size : 54147

filtered sample card. : 6018

orig. card. : 111358

single table dynamic sel. est. : 0.11114189

DS blocks / table blocks

Total rows in sample

rows satisfying the predicate

Computed selectivity

filtered / actual rows

(= satisfying / total rows)

Are Your Statistics Bad Enough?

- Dynamic sampling has a neat positive side effect
- Level 10 performs exactly the re-estimation
- no need to execute the re-estimation query, parsing with DS level 10 returns exact result
- Lower levels of DS are relatively cheap and can give some hints

Are Your Statistics Bad Enough?

Dynamic sampling levels are absolute (except for level 10)

Block size 8KB

Even for a moderate table the estimate percent of DS could be too low

DS uses block sampling only

Are Your Statistics Bad Enough?

- For a selected predicate
- build re-estimation query
- parse it with different dynamic sampling hints (e.g. 0, 2, 5 and 10)
- peek in the PLAN_TABLE and get the cardinality and access option

Predicate where part_key = 3 and GROW_SHIF > 0 and shif = 0

ACCE EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD DIFF DIFF_PREC

---- ---------- ------------ ------------ ----------- ---------- ----------

same 7982 94 116 89 -7893 -8869

Predicate where part_key = 2 and GROW = 100

ACCE EST_CARD EST_DS2_CARD EST_DS5_CARD RE_EST_CARD DIFF DIFF_PREC

---- ---------- ------------ ------------ ----------- ---------- ----------

same 4281 13 2 0 -4281 -428100

Are Your Statistics Bad Enough?

- Two extreme cases can be observed while re-estimation on lower levels of DS.
- Dynamic sampling is extremely precise even on lower levels
- Dynamic sampling doesn’t “work” until a high level

Real Cardinality

EstimatedCardinality

DS Level 0 510

Are Your Statistics Bad Enough?

Consider that dynamic sampling itself is subject of evolution

dynamic sampling on a predicate with index:

9i rel 2 – FULL TABLE scan with SAMPLE clause

10g rel 1 – INDEX ACCESS with ROWNUM <= 2500

10g rel 2 – FULL TABLE scan with SAMPLE clause

Are Your Statistics Bad Enough?

- Constant pstart and pstop values
- Add predicates for range and list partitions / subpartitions
- --------------------------------- -----------------
- | Operation | Name | | Pstart| Pstop |
- --------------------------------- -----------------
- | SELECT STATEMENT | | | | |
- | PARTITION RANGE SINGLE| | | 2 | 2 |
- | PARTITION LIST SINGLE| | | 3 | 3 |
- | TABLE ACCESS FULL | T2S | | 6 | 6 |
- ---------------------------------- ----------------
- Verify granularity parameter
- Verify partition pruning

Are Your Statistics Bad Enough?

For parallel execution plans no additional processing by generating re-estimation queries is required

| Id | Operation | Name | Rows |

----------------------------------------------------

| 0 | SELECT STATEMENT | | 82647 |

| 1 | PX COORDINATOR | | |

| 2 | PX SEND QC (RANDOM) | :TQ10001 | 82647 |

|* 3 | HASH JOIN | | 82647 |

| 4 | PX BLOCK ITERATOR | | 3519 |

|* 5 | TABLE ACCESS FULL | TA | 3519 |

| 6 | BUFFER SORT | | |

| 7 | PX RECEIVE | | 23580 |

| 8 | PX SEND BROADCAST | :TQ10000 | 23580 |

|* 9 | TABLE ACCESS FULL| TB | 23580|

Are Your Statistics Bad Enough?

Generating of re-estimation queries for bitmap index access must respect the access logic

1 | TABLE ACCESS BY INDEX ROWID | T1 | 13 |

2 | BITMAP CONVERSION TO ROWIDS| | |

3 | BITMAP AND | | |

4 | BITMAP INDEX SINGLE VALUE| IND2_T1 | |

5 | BITMAP INDEX SINGLE VALUE| IND1_T1 | |

Predicate Information (identified by operation id):

---------------------------------------------------

4 - access("T1"."GROUP2_ID"=28)

5 - access("T1"."STATUS"='X')

Generated predicate

("T1"."GROUP2_ID"=28) AND ("T1"."STATUS"='X')

The bitmap index cardinality is missing in the execution plan

Are Your Statistics Bad Enough?

How can the behavior of an application in migrated environment be predicted?

A sophisticated but expensive method is to compare execution plans

Re-estimation method can be used as a lightweight timely warning system

Are Your Statistics Bad Enough?

Limited usage on predicates with bind variables and parse time partition pruning

Predicates stored in ACCESS_PREDICATES and FILTER_PREDICATES are not always complete

A single predicate is limited to the length of 4000 bytes

Further Work

Weighted function for the result of re-estimation

Relevance to SQL profile

Are Your Statistics Bad Enough?

- Method to verify the “status” of database statistics
- Introduction of “predicate mining”
- A green bar for database statistics
- Dynamic sampling may be used as a verification method

Are Your Statistics Bad Enough?

Christian Antognini, CBO: A Configuration Roadmap, Hotsos Symposium 2005

Wolfgang Breitling, Using DBMS_STATS in Access Path Optimization, UKOUG 2004

Jonathan Lewis, Strategies for Statistics, UKOUG 2004

metalink

72539.1 Interpreting Histogram Information

114671.1 Gathering Statistics for the Cost Based Optimizer

236935.1 Global statistics - An Explanation

Are Your Statistics Bad Enough?