1 / 26

Indexes

Indexes. Rose-Hulman Institute of Technology Curt Clifton. Overview. Introduction to Indexes Index Architecture How SQL Server Retrieves Stored Data How SQL Server Maintains Index and Heap Structures Deciding Which Columns to Index. Con. Rudd. Akhtar. Smith. Martin. Ganio. Funk.

ehrhardt
Download Presentation

Indexes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indexes Rose-Hulman Institute of Technology Curt Clifton

  2. Overview • Introduction to Indexes • Index Architecture • How SQL Server Retrieves Stored Data • How SQL Server Maintains Index and Heap Structures • Deciding Which Columns to Index

  3. Con ... ... ... ... ... Rudd Akhtar Smith Martin Ganio ... Funk ... ... ... ... ... White Funk Ota Phua Jones ... White ... ... ... ... ... Barr Smith Jones Jones Hall ... ... ... ... ... ... ... ... Martin ... Smith ... ... ... ... ... ... ... ... ... ... ... ... ... ... Storing and Accessing Data • How Data Is Stored • Rows are stored in data pages • Heaps are a collection of data pages for a table Data Pages Page 8 Page 9 Page 4 Page 5 Page 6 Page 7

  4. Whether to Create Indexes • Why to Create an Index • Speeds up data access • Enforces uniqueness of rows • Why Not to Create an Index • Consumes disk space • Incurs overhead

  5. Using Heaps • SQL Server: • Uses Index Allocation Map Pages That: • Contain information on where the extents of a heap are stored • Navigate through the heap and find available space for new rows being inserted • Connect data pages • Reclaims Space for New Rows in the Heap When a Row Is Deleted

  6. Using Clustered Indexes • Each Table Can Have Only One Clustered Index • The Physical Row Order of the Table and the Order of Rows in the Index Are the Same • Key Value Uniqueness Is Maintained Explicitly or Implicitly

  7. Using Nonclustered Indexes • Nonclustered Indexes Are the SQL Server Default • Existing Nonclustered Indexes Are Automatically Rebuilt When: • An existing clustered index is dropped • A new clustered index is created • The DROP_EXISTING option is used to change which columns define the clustered index

  8. Maintaining Index and Heap Structures

  9. Ganio … Hall … … … … … Leaf Level(Key Value) Hart … Jones … Jones … Jackson … Jackson … Akhtar Akhtar … … Ganio Jones … … Lang Lang … … Smith Smith … … … … … … Barr Barr … … Hall Jones … … Martin Martin … … Smith Smith … … Barr Barr … … Hart … Martin Martin … … Smith Smith Akhtar Lang … … Borm Borm … … Martin Martin … … Smith Smith … Smith … … Buhl Buhl … … Moris Moris … … Smith Smith Martin … … … Page Splits in an Index INSERT member (last name) VALUES lastname = ‘Jackson' Index Pages Non-Leaf Level Akhtar Ganio … Jackson Leaf Level(Key Value)

  10. id indid = 2 root Non clustered Index Non Clustered Index Non-Leaf Level Non-Leaf Level Akhtar Akhtar ... ... Martin Martin Martin Akhtar Akhtar Page 37 Page 37 Page 28 Page 28 Page 12 - Root Page 12 - Root Ganio Ganio Martin Martin Martin ... ... Smith Smith ... ... Leaf Level(Key Value) Leaf Level(Key Value) Page 41 Page 41 Page 51 Page 51 Page 61 Page 61 Page 71 Page 71 Akhtar Akhtar 4:706:01 4:706:01 Smith Ganio Ganio Smith 4:706:03 4:706:03 4:709:01 4:709:01 Barr Barr 4:705:03 4:705:03 Hall Hall Smith Smith 4:709:04 4:709:04 4:708:04 4:708:04 Martin Martin 4:708:01 4:708:01 Ota 4:707:02 Con Con 4:704:01 4:704:01 Jones Smith Smith Jones 4:709:02 4:707:01 4:709:02 4:707:01 Martin Martin 4:706:04 4:706:04 Funk Funk 4:706:02 4:706:02 Jones Jones White White 4:704:03 4:708:03 4:708:03 4:704:03 Ota Ota 4:707:02 4:707:02 Funk Funk 4:704:02 4:704:02 White Jones White Jones 4:707:03 4:705:02 4:705:02 4:707:03 Phua Phua 4:708:02 4:708:02 Rudd Rudd 4:705:01 4:705:01 Heap Heap Page 704 Page 704 Page 705 Page 705 Page 706 Page 706 Page 707 Page 707 Page 708 Page 708 Page 709 Page 709 01 01 ... ... Akhtar Akhtar 01 01 ... ... Conn Conn 01 01 ... ... Rudd Rudd 01 01 ... ... Smith Smith 01 02 02 02 01 04 ... ... ... ... ... Martin Ota Martin Ota Ota Ota 01 01 ... ... Ganio Ganio 02 02 ... ... Funk Funk 02 02 ... ... Funk Funk 02 02 ... ... White White 02 02 ... ... Ota Ota 02 02 ... ... Phua Phua 02 02 ... ... Jones Jones 03 03 ... ... Smith Smith 03 03 ... ... White White 03 03 ... ... Barr Barr 03 03 ... ... Jones Jones 03 03 ... ... Jones Jones 03 03 ... ... Hall Hall 04 04 ... ... Martin Martin ... ... ... ... ... ... ... ... ... ... ... ... 04 04 ... ... Corets Corets 04 04 ... ... Smith Smith ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 05 05 ... ... Nash Nash ... ... ... ... ... ... ... ... ... ... ... ... File ID #4 File ID #4 Forwarding Pointer in a Heap sysindexes UPDATE member SET Address = <really long> WHERE lastname = 'Ota'

  11. Row Updates • Generally do not cause rows to move • Like a delete followed by an insert • Logically • Sometimes practically • Batch updates touch each index once

  12. Deletion • Deletion creates “ghost records” • Reclaiming space • Free pages when empty • For indexed table: • Can overwrite ghost records immediately • For non-indexed table: • Compact records when more space is needed for insert

  13. Deciding What to Index

  14. What You Need to Know • Logical and Physical Database Design • Data Characteristics • How Data Is Used • The types of queries performed • The frequency of queries that are typically performed

  15. Indexing Guidelines • Columns to Index • Primary and foreign keys • Those frequently searched in ranges • Those frequently accessed in sorted order • Those frequently grouped together during aggregation • Columns Not to Index • Those seldom referenced in queries • Those that contain few unique values • Those defined with text, ntext, or image data types

  16. Choosing the Clustered Index • Heavily Updated Tables • A clustered index with an identity column keeps updated pages in memory • Sorting • A clustered index keeps the data pre-sorted • Column Length and Data Type • Limit the number of columns • Reduce the number of characters • Use the smallest data type possible

  17. last_name first_name Randall Joshua . . . Randall Cynthia Randall Tristan . . . Ota Lani . . . Data Characteristics – Density High Density SELECT *FROM memberWHERE last_name = ‘Randall’ Low Density SELECT *FROM memberWHERE last_name = ‘Ota’

  18. Data Characteristics – Selectivity • How effective is a column at selecting a subset of the data • A property of a given query: • Rows matching property / Total number of rows

  19. Number of rows meeting criteriaTotal number of rows in table 100010000 = 10% = Number of rows meeting criteriaTotal number of rows in table 900010000 = 90% = Determining Selectivity High selectivity member_no last_name first_name 1 Randall Joshua 2 Flood Kathie . SELECT *FROM memberWHERE member_no > 8999 . . 10000 Anderson Bill Low selectivity member_no last_name first_name 1 Randall Joshua 2 Flood Kathie . SELECT *FROM memberWHERE member_no < 9001 . . 10000 Anderson Bill

  20. Indexing to Support Queries • Writing Good Search Arguments • Specify a WHERE clause in the query • Verify that the WHERE clause limits the number of rows • Avoid using leading wildcards

  21. Introduction to Statistics

  22. Standard Distribution of Values Number ofLast Names A - E F - J K - O P - U V - Z Last Name Even Distribution of Values Number ofLast Names A - B C - F G - K L - N O - Z Last Name

  23. How Statistics Are Gathered • DMBS reads/samples column values • Produces an evenly distributed sorted list of values • Performs a full scan or sampling of rows • Depending on size of table and granularity wanted • Selects samplings if necessary • Picks rows to be sampled • Includes all rows on the page of selected rows

  24. Creating Statistics • Automatically Creating Statistics • Indexed columns that contain data • Non-indexed columns that are used in a join predicate or a WHERE clause • Manually Creating Statistics • Columns that are not indexed • All columns other than the first column of a composite index

  25. Viewing Statistics • The DBCC SHOW_STATISTICS Statement Returns Statistical Information in the Distribution Page for an Index or Column • Statistical Information Includes: • The time when the statistics were last updated • The number of rows sampled to produce the histogram • Density information • Average key length • Histogram step information

  26. Performance Considerations • Create Indexes on Foreign Keys • Create the Clustered Index Before Nonclustered Indexes • Consider Creating Composite Indexes • Create Multiple Indexes for a Table That Is Read Frequently

More Related