Session No. DM214 Why, When and How Should I Change the Optimizer Statistics?

Session No. DM214Why, When and How Should I Change the Optimizer Statistics? • Eric Miner • Development Engineer II • ESD • Optimizer Group • eric.miner@sybase.com

Understand the statistics and how they effect the optimizer Understand why you may need/want to add or modify statistics Know when and how to add or modify the statistics Objectives

Know what the optimizer is - and some basics of how it works and what it does Have access to ASE 11.9.2 or above - this presentation does not deal with versions prior to 11.9.2 Know the basics of optimizer analysis Have some experience with traceon 302 and optdiag Assumptions

Adding or changing the statistics is notsomething you must do. It is something you should be aware of and consider doing After full testing of course You will need SA role to do some of this Please test changes to statistics before implementing them You may want to consider using optdiag simulate to test changes to the statistics - see Tech Doc 20472 “Using Optdiag Simulate Statistics Mode” Your mileage may vary Caveats

They describe the table, its indexes and the data in a column to the optimizer Used by the optimizer to estimate the most efficient way to access the data required by a query They are the optimizer’s only view of its universe The statistics are all it knows about your dataset What Are The Statistics?

There are two types of statistics - Table/Index level and Column level Table/Index statistics are now centralized. Stored in systabstats Are maintained dynamically by ASE Should not be written directly - will be quickly overwritten Distribution statistics belong to a column, not an index. Stored in sysstatistics. Are static, need to be updated or written Can be written directly Most statistics stored as varbinary - use optdiag to read and write them Review Of Changes To The Statistics in ASE 11.9.2 & Above

To make composite indexes more selective - adding statistics to inner index columns - Highly recommended To put statistics on non-indexed columns - good for join costing, no statistics on a column means assumptions are made To change the number of steps (cells) in a column’s histogram More granular histogram cells - Highly recommended To change the default selectivity values - writing statistics To change the density values - writing statistics To add statistics to a column’s histogram - writing statistics Why Would You Want To Add or Modify Statistics?

The ‘natural’ statistics are obtained by reading the data in a column. Update statistics gathers and writes the ‘natural’ statistics Statistics based on the data and the state of the table/index Default statistics - Statistics obtained by running update statistics on a table or index. As you’ve done in pre-11.9.2 versions update statistics table_A [index_1] Writes statistics for the leading column of index(es) only Natural Column Statistics vs. Modified Column Statistics

Additional Statistics Statistics on columns other than the leading column of an index - minor index columns and non-indexed columns Changing the number of requested steps Modified statistics - Any statistical value you write directly Usually done via optdiag Only column level (sysstatistics) values should be written optdiag will not allow you to write to systabstats - will be quickly overwritten Natural Statistics vs. Modified Column Level Statistics cont.

The old distribution page is used to establish new statistics - stats from upgrade are less accurate Values of the steps are used as boundary values in the new histogram. Weights are approximated based on the steps of the distribution page FCs are created from duplicate step values, again weights are only approximated Density values are copied over Step counts are based on the number of steps in the old page A Note on Upgrade And The Statistics

DO NOT delete statistics after upgrade You will lose step counts DO run update statistics as soon as possible after the upgrade This will place the new statistics on the leading columns of your indexes - the default statistics A Note on Upgrade And The Statistics cont.

Use update statistics to build/update column level statistics update statistics table_name [ind_name] Same as always, will build/update statistics on the leading columns of indexes on the table, or on the specified index update index statistics table_name [ind_name] This will build/ update column statistics on all columns of all indexes in the table, or on the specified index. Highly recommended, but not absolutely necessary WARNING- This option can take a long time to run. New Update Statistics Flavors

update all statistics table_name This will create or update column level statistics on all columns of the table. It will also run update partition statistics. WARNING - This can take a VERY long time to run It is rarely necessary to run update all statistics. In the majority of cases it’s overkill Maintenance can become overwhelming New Update Statistics Flavors cont.

The more columns with statistics the more maintenance Maintenance considerations include - The time it takes to run update statistics on each column Editing and/or reading in an optdiag file Increased use of tempdb for updating column statistics A worktable will be used to sort all inner index columns and non-indexed columns Proc cache usage for sorts - dependant on size of datatype A Word About Maintaining Statistics

Most column level statistics will be overwritten by update statistics All values based on the data are overwritten - Density values, and all histogram values ‘Range’ and ‘In between’ default selectivity values are persistent You will need to reset non-persistent values after running update statistics - Read in an optdiag file Persistent And Non-Persistent Statistics

Statistics can be added to any column Gives the optimizer more info about a composite index (more selective). On a non-indexed column - able to cost joins on non-indexed columns Not absolutely necessary, but highly recommended update statistics table_name (col_name)- for single column update index statistics table_name [ind_name] - for all columns of composite index(es) Does add maintenance, effects procedure cache Test it before implementing Adding Statistics - Next Step Beyond The Default Statistics

Statistics on the leading column only - col_A Estimated selectivity for col_A, selectivity = 0.900288, upper limit = 0.947370. No statistics available for col_B or col_C, col_B uses the default in between selectivity of 0.25 col_C uses selectivity of 0.10 for equi-SARG with no statistics Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900251,filter selectivity 0.056268 5627 rows, 6480 pages Search argument selectivity is 0.005627 Adding Statistics To Composite Index Columns - Example

Statistics on all three columns of the index selectivity of col_C was 0.10 Estimated selectivity for col_C, selectivity = 0.062328, upper limit = 0.158700. Estimating selectivity of index 't1_i1', indid 2 scan selectivity 0.900283, filter selectivity 0.000283 28 rows, 882 pages Search argument selectivity is 0.000028. Table: t1 scan count 1,logical reads:(regular=885 apf=0 total=885), physical reads:(regular=0 apf=0 total=0 Adding Statistics To Composite Index Columns - Example

Useful in costing joins on non-indexed columns Without statistics on the column there is no Total density value to use in costing joins Assumptions made about how many rows will qualify Not usually accurate - based on the join operator Estimated selectivity for col_A, selectivity = 0.100000. Statistics on the column allow the total density value to be used to estimate the number of qualifying rows. Estimated selectivity for col_A, selectivity = 0.000025, upper limit = 0.081425. Adding Statistics To - Non-Indexed Columns

The number of steps has an effect on the optimizer By default new column statistics are built using 20 steps (cells) If statistics exist the same step count will be reused unless you specify a different count Increasing the step count may result in more frequency count cells - know your data May help in optimization of range SARGs because cell granularity is increased - narrower cells (steps) Changing Requested Step Count

Cells consume procedure cache when read, the larger (wider) the datatype the more memory required for each cell The more cells read the more the effect on parse and compile time Implement increased steps only after testing. In some cases you may not need to change the step count. After an upgrade do not delete statistics or change step counts until you’ve tested Changing Requested Step Count cont.

When trying to get Frequency Count Cells (FCs) FCs represent only one value in the column - most accurate since the weight is for only one value. Range Cells represent more than one value, uniform distribution within the cell is assumed FCs can be pulled out of a RC when the value is > 50% of a cell width and there are enough requested steps available Cell width = number of rows/(number of requested steps -1) Changing Requested Step Count How Many Steps?

An example - FCs for each distinct value Table has 100K rows and 35 distinct values, the lowest number of rows occupied by a value is 6 (we want a FC for this value) The number of steps to request = ((rows * .50)/(rows for the value with least rows))+1 ( (100,000*.5)/6)+1 = 8335 The final histogram will have either 36 or 71 cells depending on the type of frequency count cell This is an extreme example. You may not need to have an FC for each value in the table Changing Requested Step Count How Many Steps? Cont.

Adding steps to decrease cell width - for range SARGs (more granular cells) Assumes Range Cells - low degree of duplicates Try doubling requested step count - then work up from there Interpolation will establish how close a range SARG value is to a boundary value of a cell and then estimate the number of rows that qualify for the SARG. The number of steps from upgrade is a good starting point Changing Requested Step Count How Many Steps? Cont.

Extensions to create index and update statistics create index I1 on T1 (colA) using X values update statistics T1 using X values X values = requested steps, seen in optdiag Remember - cells use procedure cache when read You may not need a lot of cells Again, don’t change step counts after an upgrade until you’ve tested create index with 0 values will create index, but will not write the statistics Changing Requested Step Count How To Do It

Use an optdiag input file to write the statistics directly Always get an optdiag output file before writing or changing the statistics - as insurance Useful in general if you want to go back to a previous set of statistics -o output_file_name, -i input_file_name Save a clean copy of the output file Rename and edit output file for changes to the statistics Writing The Statistics Directly

Use optdiag input files to maintain statistics changes written directly All changes to the column level statistics, other than the default selectivity values, will be over written by update statistics Traceon 302 output will display message when edited statistics are used in costing Statistics for this column have been edited. If you change non-persistent values you will need set them back after updating statistics Maintaining Directly Written Statistics

A few values occupy many rows while many values occupy a few rows - spikes in the distribution Data skew will have an effect on the Total Density value and thus on the costing of joins. Possible that the estimated number of rows qualifying for a join from an inner table will be pessimistic - check traceon 310 Weighted averaging used to obtain the total density Change Total Density or leave it alone? Changing Statistics - Data Skew and The Statistics

Change the Total Density to what? Test it first!! - the Total density value will be used for all queries that join the column One approach is to set Total Density equal to Range Cell density Be careful of a 0 density value, use something higher than 0 Maybe change it to the arithmetic average Changes to density values are not persistent - update statistics will over write them Changing Statistics - Data Skew and The Statistics cont.

Example of data skew in the histogram - Range cell density: 0.0000502421670203 Total density: 0.2697381850000000 Step Weight Value 1 0.00000000 <= 0 2 0.05263000 <= 5222 3 0.05264000 <= 10564 4 0.05265000 <= 15779 5 0.05263000 <= 20998 6 0.04125000 <= 24999 7 0.00000000 < 40000 8 0.31933998 = 40000 9 0.05263000 <= 96995 Changing Statistics - Data Skew and The Statistics cont.

sp_modifystats - New system stored procedure Will set the Total Density to equal the Range Cell Density Available in 11.9.2.2 and 12.0.1- not yet documented Caution - remember that the Total Density will be used for all joins on the column. If the Range cell density is very low you may want to modify Total density using optdiag sp_modifystats tab_name, col_name, ”REMOVE_SKEW_FROM_DENSITY” Changing Statistics - Data Skew and The Statistics cont.

Total density is used for the selectivity in the case of an unknown equi_SARG value declare @var int, select @var = 10, where col = @var Traceon 302 output sample - Selecting best index for the SEARCH CLAUSE: tabA.col1 = unknown-value If you see an unknown value message in 302 and the default selectivity is inefficient Try to eliminate the unknown value, if not possible Change the Total Density value Changing Statistics - Unknown Equi-SARG Values

The Default selectivity values are used for the selectivity of unknown range and between SARGs Use optdiag to change the default selectivity values Range (<,<=,>,>=) default value is 0.33 Between (between) default value is 0.25 Range selectivity:default used (0.33) In between selectivity:default used (0.25) Edited - Range selectivity: default used 0.0033 In between selectivity:default used 0.0025 The default selectivity values are persistent Changing Statistics - The Default Selectivity Values

If a value is greater than the largest or less than the smallest histogram value Traceon 302 sample - Estimated selectivity for colA, selectivity = 0.000000, upper limit = 0.000000. Lower bound search value 10000 is greater than the largest value in sysstatistics for this column. Special costing is done Not always the most accurate value: Selectivity of 0.00 or 1.00 depending on the value and the operator Changing Statistics - SARG Values That Are Out Of Range

Two ways to effect the histogram for this - Add a dummy row to the data Not always practical or allowed - but is persistent Add a dummy boundary value to the histogram via optdiag Easier to do in some cases, but not persistent It’s July 8 and update statistics hasn’t been run - step 20 is last 18 0.05301946 <= "May 1 2000 12:00AM" 19 0.05290456 <= "Jun 1 2000 12:00AM" 20 0.04818739 <= "Jul 1 2000 12:00AM" Change last boundary value to Aug 1 Changing Statistics - SARG Values That Are Out Of Range

Histogram cells can be added or changed by directly writing them Usually done to add FCs or a dummy boundary value A few rules apply when adding cells to the histogram: The step numbers must increase monotonically. The weight of a cell must be between 0 and 1.0. The sum of the cell weights must be close to 1.0 (0.99 to 1.01) Always test it before implementing it!! Save a copy (optdiag output) of the original stats Manipulating Histogram Cells

Two ways to use optdiag instead of update statistics The dump and load method - no editing of statistics required Dump dataset and load somewhere else Run update stats on the loaded dataset Get an optdiag output file for those tables you want new stats on Load the optdiag file into the original dataset May take as much time as running update statistics on the original dataset - but no interference with users Writing Statistics In Place Or Running Update Statistics

The optdiag method - requires editing of optdiag output files Get an optdiag output file of the tables you want to update statistics on (may also do this for individual columns) Edit the files to reflect changes in your dataset - you’ll need to understand what changes have occurred Read the file in via optdiag Very fast, care should be taken to insure that statistics are correct Test it before implementing it Writing Statistics In Place Or Running Update Statistics cont.

ASE 11.9.2 and above allows you add or write statistics Adding and writing statistics in not absolutely necessary Adding column level statistics is highly recommended in most cases Writing statistics is recommended only when necessary Conclusion

The latest Performance and Tuning Guide Don’t be put off by the “ASE 12.0” in the title, it covers the 11.9.2 features and functionality too http://sybooks.sybase.com/onlinebooks/group-as/asg1200e Any “What’s New” docs for a new ASE release Tech Docs at Sybase Support http://techinfo.sybase.com/css/techinfo.nsf/Home More Optimizer Resources and Help

The Sybase Customer newsgroups http://support.sybase.com/newsgroups The Sybase list server SYBASE-L@LISTSERV.UCSB.EDU The external Sybase FAQ http://www.isug.com/Sybase_FAQ/ Join the ISUG (ISUG Technical Journal, feature requests and more) http://www.isug.com More Optimizer Resources and Help cont.

Session No. DM214 Why, When and How Should I Change the Optimizer Statistics?

Session No. DM214 Why, When and How Should I Change the Optimizer Statistics?

Presentation Transcript

Chapter 6

Session Management Security and Applied Reverse Benchmarking - Tom Stracener, Sr. Security Analyst, Cenzic Inc.

Planned Change

Descriptive Statistics Univariate Statistics Chi Square ANOVA

SGI-USA Statistics Program Roll-Out

Statistics

Change Management

5 th Annual Meeting of the Washington Group on Disability Statistics

Session 2 Extending Principal Stratification for Characterising Trajectories of Change Richard Emsley

Bible on Money

Low Hanging Fruit Strategies to Rapidly Increase Sales with Business Optimizer Coach Stacey Hylen

National Conference on Health Statistics Session: How measurement and modeling of social determinants of health can info

Chapter 1

SESSION 10

Chapter 3

Nuts and bolts of biostatistics

Statistics Review – Part I

Logo Change

Statistics

Session 1 Orientation Session 2 User Interface Session 3 Database Administration