1 / 21

Statistics Profile For Query Optimization

Statistics Profile For Query Optimization. WENYI NI. Introduction. What is statistics profile?. Every object has its own status. In order to know its status, we need statistics. The relation between Statistics profile and statistics. Cost Model. From M.Tamer Oszu.

noe
Download Presentation

Statistics Profile For Query Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Profile For Query Optimization WENYI NI Spring 2004, CSE8330 Presentition

  2. Introduction What is statistics profile? • Every object has its own status. • In order to know its status, we need statistics. • The relation between Statistics profile and statistics. Spring 2004, CSE8330 Presentition

  3. Cost Model From M.Tamer Oszu When DBMS use statistics profile? Spring 2004, CSE8330 Presentition

  4. What does statistics profile collect? • The central tendency of the data • The range of the data • The size of the data • The distribution of the data Spring 2004, CSE8330 Presentition

  5. Common types of statistics profile • Table profile • Attribute profile • Index profile Spring 2004, CSE8330 Presentition

  6. Typical profiles Spring 2004, CSE8330 Presentition

  7. Three ways to collect statistics • Exhaustive accumulation • Sampling • Piggyback Spring 2004, CSE8330 Presentition

  8. Exhaustive accumulation • Calculate every statistics describer through scanning the related object exhaustively • Advantage Most Accurate • Disadvantage Heavy system load Spring 2004, CSE8330 Presentition

  9. Sampling • Scan part of the related object. Estimate statistics through sample data • Advantage Low system overhead • Disadvantage Still have overhead. Statistics is not 100% accurate. Spring 2004, CSE8330 Presentition

  10. Piggyback • Collect statistics through data in memory. Slightly change SQL statement to make full use of these data. • Types of piggyback • Vertical piggyback • Horizontal piggyback • Mixed piggyback Spring 2004, CSE8330 Presentition

  11. Vertical piggyback • Include extra columns during query processing Example: Select student.name from student; rewrite to: Select student.name,student.age from student; Spring 2004, CSE8330 Presentition

  12. No extra I/O, but extra cpu load. Solution: set piggyback level • AC1 = { x| x is a column in Table Ri referenced by Query Q} • AC2 = { x| x is an index column in Table Ri } – AC1 • AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC2 • AC4 = { x| x is a column in Table Ri }-AC3 Advantage: Choose your piggyback level according to the CPU load Spring 2004, CSE8330 Presentition

  13. Horizontal piggyback • Include extra rows during query process Example: Select student.name, student.score From student where score >60; Rewrite to: Select student.name, student.score From student where score >60 or student.pid In (Select student.pid for student Where score>60); • Advantage Spring 2004, CSE8330 Presentition

  14. Mixed piggyback • Use both vertical and horizontal piggyback method • Advantage Spring 2004, CSE8330 Presentition

  15. Value distribution • Why we need it? Example: Select * from Student Where score>60; Size?? Spring 2004, CSE8330 Presentition

  16. Answer: Size = 500*0.81*30 = 121.5 Where 500 is the cardinality of the student table. 30 is the size of each record Spring 2004, CSE8330 Presentition

  17. How to get distribution table? • Histogram • Equal width • Equal height Spring 2004, CSE8330 Presentition

  18. Bucket number • 1+ logn [rule of sturge 1927] Example: student table ( 500 records) 1+log500 = 10 • For equal width, put each value into the proper buckets • For equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order Spring 2004, CSE8330 Presentition

  19. Sampling • How many sample do we need? A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988) To gain same error rate for varies size of table, Sample rate drops when size of table grows. Drop rate: log(n)/n Example: 20 sample with 2%error rate on table with 100 records We need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records Spring 2004, CSE8330 Presentition

  20. Summery & Future work • Low overhead • Low error rate, still have room to improve • The way to estimate the size of project and join operations with statistics still need be improved. Spring 2004, CSE8330 Presentition

  21. The end Spring 2004, CSE8330 Presentition

More Related