Statistics Profile For Query Optimization

1 / 21

# Statistics Profile For Query Optimization - PowerPoint PPT Presentation

Statistics Profile For Query Optimization. WENYI NI. Introduction. What is statistics profile?. Every object has its own status. In order to know its status, we need statistics. The relation between Statistics profile and statistics. Cost Model. From M.Tamer Oszu.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Statistics Profile For Query Optimization' - noe

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics Profile For Query Optimization

WENYI NI

Spring 2004, CSE8330 Presentition

Introduction

What is statistics profile?

• Every object has its own status.
• In order to know its status, we need statistics.
• The relation between Statistics profile and statistics.

Spring 2004, CSE8330 Presentition

Cost Model

From M.Tamer Oszu

When DBMS use statistics profile?

Spring 2004, CSE8330 Presentition

What does statistics profile collect?
• The central tendency of the data
• The range of the data
• The size of the data
• The distribution of the data

Spring 2004, CSE8330 Presentition

Common types of statistics profile
• Table profile
• Attribute profile
• Index profile

Spring 2004, CSE8330 Presentition

Typical profiles

Spring 2004, CSE8330 Presentition

Three ways to collect statistics
• Exhaustive accumulation
• Sampling
• Piggyback

Spring 2004, CSE8330 Presentition

Exhaustive accumulation
• Calculate every statistics describer through scanning the related object exhaustively

Most Accurate

Spring 2004, CSE8330 Presentition

Sampling
• Scan part of the related object. Estimate statistics through sample data

Still have overhead. Statistics is not 100% accurate.

Spring 2004, CSE8330 Presentition

Piggyback
• Collect statistics through data in memory. Slightly change SQL statement to make full use of these data.
• Types of piggyback
• Vertical piggyback
• Horizontal piggyback
• Mixed piggyback

Spring 2004, CSE8330 Presentition

Vertical piggyback
• Include extra columns during query processing

Example:

Select student.name

from student;

rewrite to:

Select student.name,student.age

from student;

Spring 2004, CSE8330 Presentition

No extra I/O, but extra cpu load.

Solution: set piggyback level

• AC1 = { x| x is a column in Table Ri referenced by Query Q}
• AC2 = { x| x is an index column in Table Ri } – AC1
• AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC2
• AC4 = { x| x is a column in Table Ri }-AC3

Spring 2004, CSE8330 Presentition

Horizontal piggyback
• Include extra rows during query process

Example:

Select student.name, student.score

From student where score >60;

Rewrite to:

Select student.name, student.score

From student where score >60 or student.pid In

(Select student.pid for student

Where score>60);

Spring 2004, CSE8330 Presentition

Mixed piggyback
• Use both vertical and horizontal piggyback method

Spring 2004, CSE8330 Presentition

Value distribution
• Why we need it?

Example:

Select * from Student

Where score>60;

Size??

Spring 2004, CSE8330 Presentition

Size = 500*0.81*30 = 121.5

Where 500 is the cardinality of the student table. 30 is the size of each record

Spring 2004, CSE8330 Presentition

How to get distribution table?
• Histogram
• Equal width
• Equal height

Spring 2004, CSE8330 Presentition

Bucket number
• 1+ logn [rule of sturge 1927]

Example: student table ( 500 records)

1+log500 = 10

• For equal width, put each value into the proper buckets
• For equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order

Spring 2004, CSE8330 Presentition

Sampling
• How many sample do we need?

A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988)

To gain same error rate for varies size of table,

Sample rate drops when size of table grows.

Drop rate: log(n)/n

Example:

20 sample with 2%error rate on table with 100 records

We need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records

Spring 2004, CSE8330 Presentition

Summery & Future work