Unlocking the Mysteries Behind Update Statistics
Download
1 / 37

Informix Chat with the Labs - PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on

Unlocking the Mysteries Behind Update Statistics. John F. Miller III. STSM. Informix Chat with the Labs. Throw dice, how many will be 1?. The Dice Problem. How many dice are you throwing? How many sides does each dice have? Are all the dice the same?. Questions about the Dice.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Informix Chat with the Labs' - tabib


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Unlocking the Mysteries Behind Update Statistics

John F. Miller III

STSM

Informix Chat with the Labs



Questions about the dice

How many dice are you throwing?

How many sides does each dice have?

Are all the dice the same?

Questions about the Dice

The better the information,

the more accurate the estimate.


What does update statistics do

Collects information for the optimizer

Statistics LOW

Distributions MEDIUM & HIGH

Drop Distributions

Compile stored procedures

What does Update Statistics do?


Statistics collected

systables

systables

syscolumns

syscolumns

sysindexes

sysindexes

Number of Rows

Number of pages to store the data

Second largest value for a column

Second smallest value for a column

# of unique values for the lead key

How highly clustered the values for the lead key

Statistics Collected


Update statistics low basic algorithm

Walk the leaf pages in each index

Submit btree cleaner requests when deleted items are found causing re-balancing of indexes

Collects the following information

Number of unique items

Number of leave pages

How clustered the data is

Second highest and lowest value

Update Statistics LowBasic Algorithm


How to read distributions

To get the range of values look at the highest value in the previous bin.

How to Read Distributions

# of rows represented in this bin

--- DISTRIBUTION ---

( -1

1: ( 868317, 70, 75)

2: ( 868317, 24, 100)

3: ( 868317, 12, 116)

4: ( 868317, 30, 147)

5: ( 868317, 39, 194)

6: ( 868317, 28, 222)

--- OVERFLOW ---

1: ( 779848, 43)

2: ( 462364, 45)

# of unique values

Highest Value in this bin

The value

# of rows for this value


Example approximating a value
Example - Approximating a Value previous bin

  • There are 868317 rows containing a value between -1 and 75

  • There are 70 unique values in this range

  • The optimizer will deduce 868317 / 70 = 12,404 records for each value between -1 and 75

--- DISTRIBUTION ---

( -1

1: ( 868317, 70, 75)

2: ( 868317, 24, 100)

3: ( 868317, 12, 116)

4: ( 868317, 30, 147)

5: ( 868317, 39, 194)

6: ( 868317, 28, 222)

--- OVERFLOW ---

1: ( 779848, 43)

2: ( 462364, 45)


Example dealing with data skew
Example - Dealing with Data Skew previous bin

  • Data skew

  • For the value 43 how many records will the optimizer estimate will exist?

  • Answer 779848 values

  • Any value that exceeds 25% of the bin size will be placed in an overflow bin

--- DISTRIBUTION ---

( -1

1: ( 868317, 70, 75)

2: ( 868317, 24, 100)

3: ( 868317, 12, 116)

4: ( 868317, 30, 147)

5: ( 868317, 39, 194)

6: ( 868317, 28, 222)

--- OVERFLOW ---

1: ( 779848, 43)

2: ( 462364, 45)


Basic algorithm for distributions
Basic Algorithm for Distributions previous bin

  • Build distributions

  • Begin transaction

    • Delete old columns distributions

    • Insert new columns distributions

  • Commit transaction

  • Develop scan plan based on available resources

  • Scan table

    • High = All rows

    • Medium = Sample of rows

  • Sort each column


Sample size
Sample Size previous bin

  • HIGH

    • All rows in the table

  • Medium

    • Misconception about the number of rows sampled is based on the number of rows in the table, this is incorrect.

    • The number of samples depends on the Confidence and Resolution.

    • If the sample size is greater than the number of row in the table Medium turns into High mode


Update statistics medium sample size
Update Statistics Medium previous binSample Size


How much information is enough
How Much Information is Enough?? previous bin

The better the information,

the more accurate the estimate.


Examining the running query no statistics vs medium statistics
Examining the Running Query previous binNo Statistics VS Medium Statistics

No Statistics

QUERY:

------

select * from t1 where c1 > 20200

Estimated Cost: 20888

Estimated # of Rows Returned: 6760

1) miller3.t1: SEQUENTIAL SCAN

Filters: miller3.t1.c1 > 20200

Medium Statistics

QUERY:

------

select * from t1 where c1 > 20200

Estimated Cost: 21

Estimated # of Rows Returned: 19

1) miller3.t1: INDEX PATH

(1) Index Keys: c1 (Serial, fragments: ALL)

Lower Index Filter: t1.c1 > 20250

Overall performance improved

The estimates were more accurate

The query plan changed


Examining the running query medium statistics vs high statistics
Examining the Running Query previous binMedium Statistics VS High Statistics

Medium Statistics

QUERY:

------

select * from t1 where c1 > 20200

Estimated Cost: 21

Estimated # of Rows Returned: 19

1) miller3.t1: INDEX PATH

(1) Index Keys: c1

Lower Index Filter: t1.c1 > 20250

High Statistics

QUERY:

------

select * from t1 where c1 > 20200

Estimated Cost: 33

Estimated # of Rows Returned: 30

1) miller3.t1: INDEX PATH

(1) Index Keys: c1

Lower Index Filter: t1.c1 > 20250

Overall performance did not change

The estimates were slightly more accurate

The query plan did not change


Version of update statistics improvements

All version of 9.40 and 10.00 previous bin

9.30.UC3

9.21 Not fixed

7.31.UD2

Version of Update Statistics Improvements


Improvements in update statistics

Update statistics can not allocated memory between 4MB and 100MB of sort memory

The default has been raised from 4MB to 15MB

User can now configure the amount of memory

Use DBUPSPACE has been augmented to include memory

Format of DBUPSPACE

{max disk space}:{default memory}

To increase the memory to 35 MB, set DBUPSPACE=0:35.

Allow update statistics to use light scans when scanning a a table

Implemented light scans

Set oriented reads

Improvements in Update Statistics


Improvements in update statistics1

Information about building data distributions is not viewable by the DBA

Set explain will now print the scan path and resource usage when building data distributions

Update statistics low on fragmented tables does not run in parallel

With PDQ turned on each index fragment will be scanned in parallel

PDQ at 1 means 10% of the index fragments scanned in parallel, while PDQ at 10 means all the index fragments will be scanned in parallel

Improvements in update statistics


Improvements in update statistics2

Various errors (126, 312, 100,…) when executing update statistics

Errors when trying to insert the distributions because set lock mode to wait was not handled properly inside update statistics

Range scanning a fragmented index is slow Replace the next loop merge with a binary search merge when ordering items from index fragments

Most noticeable when the number of fragments in an index is large

Improvements in Update Statistics


Update statistics medium memory requirements
Update Statistics Medium statistics Memory Requirements


Update statistics high memory requirements

In memory sort statistics

Approximate Memory = number of rows * sum(column widths + 2 * sizeof(pointer) )

Update Statistics High Memory Requirements


Memory rules

Estimated Update Stats memory is below 100MB statistics

Hard coded limit of 4MB

Attempts to minimize the scans by fitting as many columns into 4MB

Estimated Update Stats memory is above 100MB

Memory is requested from MGM

Attempt to minimize the scans by fitting as many columns in the MGM memory

Memory Rules


Examples
Examples statistics

  • Customer Table

    Cust_id integer

    Fname char(50)

    Lname char(50)

    Address1 char(200)

    Address2 char(200)

    State char(2)

    zipcode integer

  • Number of Rows 500,000


Examples memory for incore sort
Examples statistics Memory for Incore Sort


Examples number of table scans
Examples statistics Number of Table Scans



Resolution

Percentage of data that is represented in a distribution bin medium

Example

100,000 rows in the table

Resolution of 2%

Each bin will represent 2,000 rows

Resolution


Example
Example medium

  • Following Example

    • Table size 215,000 rows

    • Row size 445 bytes

    • Uniprocessor


Example of the current update statistics

Table: jmiller.t9 medium

Mode: HIGH

Number of Bins: 267 Bin size 1082

Sort data 101.4 MB

Sort memory granted 4.0 MB

Estimated number of table scans 10

PASS #1 c9

PASS #2 c5

PASS #3 c7

PASS #4 c6

…..

PASS #10 c4

Completed pass 1 in 0 minutes 24 seconds

Completed pass 2 in 0 minutes 20 seconds

Completed pass 3 in 0 minutes 17 seconds

Completed pass 4 in 0 minutes 17 seconds

Completed pass 5 in 0 minutes 17 seconds

Completed pass 6 in 0 minutes 15 seconds

Completed pass 7 in 0 minutes 14 seconds

Completed pass 8 in 0 minutes 15 seconds

Completed pass 9 in 0 minutes 16 seconds

Completed pass 10 in 0 minutes 14 seconds

Example of the current update statistics

Total Time 146 seconds


The new defaults

Completed pass 1 in 0 minutes 34 seconds medium

Completed pass 2 in 0 minutes 19 seconds

Completed pass 3 in 0 minutes 16 seconds

Completed pass 4 in 0 minutes 14 seconds

Completed pass 5 in 0 minutes 15 seconds

The New Defaults

Table: jmiller.t9

Mode: HIGH

Number of Bins: 267 Bin size 1082

Sort data 101.4 MB

Sort memory granted 15.0 MB

Estimated number of table scans 7

PASS #1 c9,c8,c10,c5,c7

PASS #2 c6,c1

PASS #3 c3

PASS #4 c2

PASS #5 c4

Total Time 98 seconds

New Memory Default


Enabling pdq with update statistics

Table: jmiller.t9 medium

Mode: HIGH

Number of Bins: 267 Bin size 1082

Sort data 101.4 MB

PDQ memory granted 106.5 MB

Estimated number of table scans 1

PASS #1 c1,c2,c3,c4,c5,c6,c7,c8,c9,c10

Index scans disabled

Light scans enabled

Completed pass 1 in 0 minutes 29 seconds

Enabling PDQ with Update Statistics

PDQ Memory

Features Enabled

Total Time 29 seconds


Tuning with the new statistics

Turn on PDQ when running update statistics, but only for tables

Avoid PDQ when updating statistics for procedures

When running high or medium increase the memory update statistics has to work with

Enable parallel sorting (i.e. PSORT_NPROCS)

Tuning with the New Statistics


Considerations

Change the RESOLUTION to 1.5 tables

Increasing the number of bins for the distributions

Increasing the sample size for update statistics medium

Considerations


Old recommendations
Old Recommendations tables

  • Start one update statistics for each column of a table

Fname

Lname

Address

Three sequential

scans of the table


New recommendations
New Recommendations tables

  • Start one update statistics for ALL columns giving it more resources (memory)

  • Requires only one scan of the table to produce distributions on several columns.

Fname

Lname

Address

One scans of the table


Other information

An Overview of the IBM Informix Dynamic Server Optimizer tables

www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0211desai/0211desai.html

Understanding and Tuning Update Statistics

www.ibm.com/developerworks/db2/zones/informix/library/techarticle/miller/0203miller.html

Predicate Inference in Informix Dynamic Server

www.ibm.com/developerworks/db2/zones/informix/library/techarticle/0206goswami/0206goswami.html

IBM Informix Performance Manual

IBM Informix SQL Reference Manual

Other Information


Questions
Questions tables


ad