statistics profile for query optimization n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Statistics Profile For Query Optimization PowerPoint Presentation
Download Presentation
Statistics Profile For Query Optimization

Loading in 2 Seconds...

play fullscreen
1 / 21

Statistics Profile For Query Optimization - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Statistics Profile For Query Optimization. WENYI NI. Introduction. What is statistics profile?. Every object has its own status. In order to know its status, we need statistics. The relation between Statistics profile and statistics. Cost Model. From M.Tamer Oszu.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistics Profile For Query Optimization' - noe


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
statistics profile for query optimization

Statistics Profile For Query Optimization

WENYI NI

Spring 2004, CSE8330 Presentition

introduction
Introduction

What is statistics profile?

  • Every object has its own status.
  • In order to know its status, we need statistics.
  • The relation between Statistics profile and statistics.

Spring 2004, CSE8330 Presentition

slide3

Cost Model

From M.Tamer Oszu

When DBMS use statistics profile?

Spring 2004, CSE8330 Presentition

what does statistics profile collect
What does statistics profile collect?
  • The central tendency of the data
  • The range of the data
  • The size of the data
  • The distribution of the data

Spring 2004, CSE8330 Presentition

common types of statistics profile
Common types of statistics profile
  • Table profile
  • Attribute profile
  • Index profile

Spring 2004, CSE8330 Presentition

typical profiles
Typical profiles

Spring 2004, CSE8330 Presentition

three ways to collect statistics
Three ways to collect statistics
  • Exhaustive accumulation
  • Sampling
  • Piggyback

Spring 2004, CSE8330 Presentition

exhaustive accumulation
Exhaustive accumulation
  • Calculate every statistics describer through scanning the related object exhaustively
  • Advantage

Most Accurate

  • Disadvantage

Heavy system load

Spring 2004, CSE8330 Presentition

sampling
Sampling
  • Scan part of the related object. Estimate statistics through sample data
  • Advantage

Low system overhead

  • Disadvantage

Still have overhead. Statistics is not 100% accurate.

Spring 2004, CSE8330 Presentition

piggyback
Piggyback
  • Collect statistics through data in memory. Slightly change SQL statement to make full use of these data.
  • Types of piggyback
  • Vertical piggyback
  • Horizontal piggyback
  • Mixed piggyback

Spring 2004, CSE8330 Presentition

vertical piggyback
Vertical piggyback
  • Include extra columns during query processing

Example:

Select student.name

from student;

rewrite to:

Select student.name,student.age

from student;

Spring 2004, CSE8330 Presentition

slide12
No extra I/O, but extra cpu load.

Solution: set piggyback level

  • AC1 = { x| x is a column in Table Ri referenced by Query Q}
  • AC2 = { x| x is an index column in Table Ri } – AC1
  • AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC2
  • AC4 = { x| x is a column in Table Ri }-AC3

Advantage: Choose your piggyback level according to the CPU load

Spring 2004, CSE8330 Presentition

horizontal piggyback
Horizontal piggyback
  • Include extra rows during query process

Example:

Select student.name, student.score

From student where score >60;

Rewrite to:

Select student.name, student.score

From student where score >60 or student.pid In

(Select student.pid for student

Where score>60);

  • Advantage

Spring 2004, CSE8330 Presentition

mixed piggyback
Mixed piggyback
  • Use both vertical and horizontal piggyback method
  • Advantage

Spring 2004, CSE8330 Presentition

value distribution
Value distribution
  • Why we need it?

Example:

Select * from Student

Where score>60;

Size??

Spring 2004, CSE8330 Presentition

answer
Answer:

Size = 500*0.81*30 = 121.5

Where 500 is the cardinality of the student table. 30 is the size of each record

Spring 2004, CSE8330 Presentition

how to get distribution table
How to get distribution table?
  • Histogram
  • Equal width
  • Equal height

Spring 2004, CSE8330 Presentition

bucket number
Bucket number
  • 1+ logn [rule of sturge 1927]

Example: student table ( 500 records)

1+log500 = 10

  • For equal width, put each value into the proper buckets
  • For equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order

Spring 2004, CSE8330 Presentition

sampling1
Sampling
  • How many sample do we need?

A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988)

To gain same error rate for varies size of table,

Sample rate drops when size of table grows.

Drop rate: log(n)/n

Example:

20 sample with 2%error rate on table with 100 records

We need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records

Spring 2004, CSE8330 Presentition

summery future work
Summery & Future work
  • Low overhead
  • Low error rate, still have room to improve
  • The way to estimate the size of project and join operations with statistics still need be improved.

Spring 2004, CSE8330 Presentition

the end
The end

Spring 2004, CSE8330 Presentition