Indexes

Indexes Rose-Hulman Institute of Technology Curt Clifton

Overview • Introduction to Indexes • Index Architecture • How SQL Server Retrieves Stored Data • How SQL Server Maintains Index and Heap Structures • Deciding Which Columns to Index

Con ... ... ... ... ... Rudd Akhtar Smith Martin Ganio ... Funk ... ... ... ... ... White Funk Ota Phua Jones ... White ... ... ... ... ... Barr Smith Jones Jones Hall ... ... ... ... ... ... ... ... Martin ... Smith ... ... ... ... ... ... ... ... ... ... ... ... ... ... Storing and Accessing Data • How Data Is Stored • Rows are stored in data pages • Heaps are a collection of data pages for a table Data Pages Page 8 Page 9 Page 4 Page 5 Page 6 Page 7

Whether to Create Indexes • Why to Create an Index • Speeds up data access • Enforces uniqueness of rows • Why Not to Create an Index • Consumes disk space • Incurs overhead

Using Heaps • SQL Server: • Uses Index Allocation Map Pages That: • Contain information on where the extents of a heap are stored • Navigate through the heap and find available space for new rows being inserted • Connect data pages • Reclaims Space for New Rows in the Heap When a Row Is Deleted

Using Clustered Indexes • Each Table Can Have Only One Clustered Index • The Physical Row Order of the Table and the Order of Rows in the Index Are the Same • Key Value Uniqueness Is Maintained Explicitly or Implicitly

Using Nonclustered Indexes • Nonclustered Indexes Are the SQL Server Default • Existing Nonclustered Indexes Are Automatically Rebuilt When: • An existing clustered index is dropped • A new clustered index is created • The DROP_EXISTING option is used to change which columns define the clustered index

Maintaining Index and Heap Structures

Ganio … Hall … … … … … Leaf Level(Key Value) Hart … Jones … Jones … Jackson … Jackson … Akhtar Akhtar … … Ganio Jones … … Lang Lang … … Smith Smith … … … … … … Barr Barr … … Hall Jones … … Martin Martin … … Smith Smith … … Barr Barr … … Hart … Martin Martin … … Smith Smith Akhtar Lang … … Borm Borm … … Martin Martin … … Smith Smith … Smith … … Buhl Buhl … … Moris Moris … … Smith Smith Martin … … … Page Splits in an Index INSERT member (last name) VALUES lastname = ‘Jackson' Index Pages Non-Leaf Level Akhtar Ganio … Jackson Leaf Level(Key Value)

id indid = 2 root Non clustered Index Non Clustered Index Non-Leaf Level Non-Leaf Level Akhtar Akhtar ... ... Martin Martin Martin Akhtar Akhtar Page 37 Page 37 Page 28 Page 28 Page 12 - Root Page 12 - Root Ganio Ganio Martin Martin Martin ... ... Smith Smith ... ... Leaf Level(Key Value) Leaf Level(Key Value) Page 41 Page 41 Page 51 Page 51 Page 61 Page 61 Page 71 Page 71 Akhtar Akhtar 4:706:01 4:706:01 Smith Ganio Ganio Smith 4:706:03 4:706:03 4:709:01 4:709:01 Barr Barr 4:705:03 4:705:03 Hall Hall Smith Smith 4:709:04 4:709:04 4:708:04 4:708:04 Martin Martin 4:708:01 4:708:01 Ota 4:707:02 Con Con 4:704:01 4:704:01 Jones Smith Smith Jones 4:709:02 4:707:01 4:709:02 4:707:01 Martin Martin 4:706:04 4:706:04 Funk Funk 4:706:02 4:706:02 Jones Jones White White 4:704:03 4:708:03 4:708:03 4:704:03 Ota Ota 4:707:02 4:707:02 Funk Funk 4:704:02 4:704:02 White Jones White Jones 4:707:03 4:705:02 4:705:02 4:707:03 Phua Phua 4:708:02 4:708:02 Rudd Rudd 4:705:01 4:705:01 Heap Heap Page 704 Page 704 Page 705 Page 705 Page 706 Page 706 Page 707 Page 707 Page 708 Page 708 Page 709 Page 709 01 01 ... ... Akhtar Akhtar 01 01 ... ... Conn Conn 01 01 ... ... Rudd Rudd 01 01 ... ... Smith Smith 01 02 02 02 01 04 ... ... ... ... ... Martin Ota Martin Ota Ota Ota 01 01 ... ... Ganio Ganio 02 02 ... ... Funk Funk 02 02 ... ... Funk Funk 02 02 ... ... White White 02 02 ... ... Ota Ota 02 02 ... ... Phua Phua 02 02 ... ... Jones Jones 03 03 ... ... Smith Smith 03 03 ... ... White White 03 03 ... ... Barr Barr 03 03 ... ... Jones Jones 03 03 ... ... Jones Jones 03 03 ... ... Hall Hall 04 04 ... ... Martin Martin ... ... ... ... ... ... ... ... ... ... ... ... 04 04 ... ... Corets Corets 04 04 ... ... Smith Smith ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 05 05 ... ... Nash Nash ... ... ... ... ... ... ... ... ... ... ... ... File ID #4 File ID #4 Forwarding Pointer in a Heap sysindexes UPDATE member SET Address = <really long> WHERE lastname = 'Ota'

Row Updates • Generally do not cause rows to move • Like a delete followed by an insert • Logically • Sometimes practically • Batch updates touch each index once

Deletion • Deletion creates “ghost records” • Reclaiming space • Free pages when empty • For indexed table: • Can overwrite ghost records immediately • For non-indexed table: • Compact records when more space is needed for insert

Deciding What to Index

What You Need to Know • Logical and Physical Database Design • Data Characteristics • How Data Is Used • The types of queries performed • The frequency of queries that are typically performed

Indexing Guidelines • Columns to Index • Primary and foreign keys • Those frequently searched in ranges • Those frequently accessed in sorted order • Those frequently grouped together during aggregation • Columns Not to Index • Those seldom referenced in queries • Those that contain few unique values • Those defined with text, ntext, or image data types

Choosing the Clustered Index • Heavily Updated Tables • A clustered index with an identity column keeps updated pages in memory • Sorting • A clustered index keeps the data pre-sorted • Column Length and Data Type • Limit the number of columns • Reduce the number of characters • Use the smallest data type possible

last_name first_name Randall Joshua . . . Randall Cynthia Randall Tristan . . . Ota Lani . . . Data Characteristics – Density High Density SELECT *FROM memberWHERE last_name = ‘Randall’ Low Density SELECT *FROM memberWHERE last_name = ‘Ota’

Data Characteristics – Selectivity • How effective is a column at selecting a subset of the data • A property of a given query: • Rows matching property / Total number of rows

Number of rows meeting criteriaTotal number of rows in table 100010000 = 10% = Number of rows meeting criteriaTotal number of rows in table 900010000 = 90% = Determining Selectivity High selectivity member_no last_name first_name 1 Randall Joshua 2 Flood Kathie . SELECT *FROM memberWHERE member_no > 8999 . . 10000 Anderson Bill Low selectivity member_no last_name first_name 1 Randall Joshua 2 Flood Kathie . SELECT *FROM memberWHERE member_no < 9001 . . 10000 Anderson Bill

Indexing to Support Queries • Writing Good Search Arguments • Specify a WHERE clause in the query • Verify that the WHERE clause limits the number of rows • Avoid using leading wildcards

Introduction to Statistics

Standard Distribution of Values Number ofLast Names A - E F - J K - O P - U V - Z Last Name Even Distribution of Values Number ofLast Names A - B C - F G - K L - N O - Z Last Name

How Statistics Are Gathered • DMBS reads/samples column values • Produces an evenly distributed sorted list of values • Performs a full scan or sampling of rows • Depending on size of table and granularity wanted • Selects samplings if necessary • Picks rows to be sampled • Includes all rows on the page of selected rows

Creating Statistics • Automatically Creating Statistics • Indexed columns that contain data • Non-indexed columns that are used in a join predicate or a WHERE clause • Manually Creating Statistics • Columns that are not indexed • All columns other than the first column of a composite index

Viewing Statistics • The DBCC SHOW_STATISTICS Statement Returns Statistical Information in the Distribution Page for an Index or Column • Statistical Information Includes: • The time when the statistics were last updated • The number of rows sampled to produce the histogram • Density information • Average key length • Histogram step information

Performance Considerations • Create Indexes on Foreign Keys • Create the Clustered Index Before Nonclustered Indexes • Consider Creating Composite Indexes • Create Multiple Indexes for a Table That Is Read Frequently

Indexes

Indexes

Presentation Transcript

Indexes

Indexes

Indexes

Indexes

Variant Indexes

Indexes

Indexes

Managing Indexes

Indexes

MARKET INDEXES

Indexes

BITMAP INDEXES

Secure Indexes

BITMAP INDEXES

Indexes

Primary Indexes

Indexes

Indexes:

Indexes