Module 5: Planning Indexes

Module 5: Planning Indexes

Overview • Introducing Indexes • Understanding Index Architecture • Retrieving Stored Data with SQL Server • Maintaining Index and Heap Structures in SQL Server • Deciding Which Columns to Index

Lesson: Introducing Indexes • How SQL Server Stores and Accesses Data • Considerations For Creating Indexes

How SQL Server Stores and Accesses Data ... ... ... ... ... Russo Akers Martin Smith Ganio Con ... ... ... ... ... ... Woods Funk Pica Jones Owen Funk ... ... ... ... ... ... Barr Smith Jones Woods Hall Jones ... ... ... ... ... ... ... Martin Smith ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... • How Data Is Stored • Rows are stored in data pages • Heaps are a collection of data pages for a table without a clustered index • In a clustered index, the leaf level is the actual data page • How Data Is Accessed • Scanning all data pages in a table • Using an index that points to data on a page Data Pages Page 4 Page 5 Page 6 Page 7 Page 8 Page 9

Considerations for Creating Indexes • Why Create an Index • Speeds up data access • Enforces uniqueness of rows • Why Not Create an Index • Consumes disk space and cache memory • Incurs overhead

Lesson: Understanding Index Architecture • SQL Server Index Architecture • What Are Heaps? • What Are Clustered Indexes? • What Are Nonclustered Indexes?

Multimedia Presentation: SQL Server Index Architecture • Clustered Indexes • Nonclustered Indexes Built on Top of a Heap • Nonclustered Indexes Built on Top of a Clustered Index

What Are Heaps? SQL Server: • Uses Index Allocation Map Pages That: • Contain information on where the extents of a heap are stored • Navigate through the heap and find available space for new rows being inserted • Associate data pages with the table • Reclaims Space for New Rows in the Heap When a Row Is Deleted

What Are Clustered Indexes? • Each Table Can Have Only One Clustered Index • The Physical Row Order of the Table and the Order of Rows in the Index Are the Same • Key Value Uniqueness Is Maintained Explicitly or Implicitly

What Are Nonclustered Indexes? • Nonclustered Indexes Are the SQL Server Default • Existing Nonclustered Indexes Are Automatically Rebuilt When: • An existing clustered index is dropped • A clustered index is created • The DROP_EXISTING option is used to change which columns define the clustered index

Lesson: Retrieving Stored Data with SQL Server • How SQL Server Uses the sysindexes Table • Process of Finding Rows Without Indexes • Process of Finding Rows in a Heap with a Nonclustered Index • Process of Finding Rows in a Clustered Index • Process of Finding Rows in a Clustered Index with a Nonclustered Index

How SQL Server Uses the sysindexes Table • Describes the Indexes • Location of IAM, First, and Root of Indexes • Number of Pages and Rows • Distribution of Data indid Object Type 0 Heap 1 Clustered Index 2 to 250 Nonclustered Index 255 text, ntext, or image

Process of Finding Rows Without Indexes sysindexes id indid = 0 First IAM Extent Bit Map … 127 1 01 01 01 01 01 01 01 Graff Con Con Con Con Con Dunn … … … … … … … 128 1 02 02 02 02 02 02 02 Randall Funk Funk Funk Funk Funk Bacon … … … … … … … 129 0 03 03 03 03 03 03 03 Woods Woods Woods Koch Woods Woods Owen … … … … … … … 130 1 … 04 … … 04 … … Durrer ... ... ... ... Sleppy ... ... ... ... ... ... ... ... … 05 05 … … … … … ... Lang ... ... ... LaMee ... ... ... ... ... ... ... ... 01 Seattle … 02 Paris … 03 Tokyo … 01 01 01 01 01 01 01 01 Russo Russo Russo Russo Russo Russo Russo Russo … … … … … … … … 01 01 01 01 01 01 01 01 Akers Akers Akers Akers Akers Akers Akers Akers … … … … … … … … 01 01 01 01 01 01 01 01 Smith Smith Smith Smith Smith Smith Smith Smith … … … … … … … … 04 Atlanta ... 02 02 02 02 02 02 02 02 Woods Woods Woods Woods Woods Woods Woods Woods … … … … … … … … 02 02 02 02 02 02 02 02 Funk Funk Funk Funk Funk Funk Funk Funk … … … … … … … … 02 02 02 02 02 02 02 02 Owen Owen Owen Owen Owen Owen Owen Owen … … … … … … … … … ... ... 03 03 03 03 03 03 03 03 Barr Barr Barr Barr Barr Barr Barr Barr … … … … … … … … 03 03 03 03 03 03 03 03 Smith Smith Smith Smith Smith Smith Smith Smith … … … … … … … … 03 03 03 03 03 03 03 03 Jones Jones Jones Jones Jones Jones Jones Jones … … … … … … … … … … … … … … … … ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 04 04 04 04 04 04 04 04 Martin Martin Martin Martin Martin Martin Martin Martin ... ... ... ... ... ... ... ... … … … … … … … … … … … … … … … … ... ... ... ... ... ... ... ... … … … … … … … … ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... … … … … … … … … ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... … … … … … … … … ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... IAM Heap Extent 128 Extent 129 Extent 130 Extent 127

Process of Finding Rows in a Heap with a Nonclustered Index id id indid = 2 indid = 2 root root sysindexes Non Clustered Index Page 12 - Root Non-Leaf Level Akers Akers Akers Akers Ganio … Ganio … ... Martin ... Martin Page 37 Page 28 Martin Martin Martin Smith Akers Akers 4:706:01 4:706:01 Smith Smith Ganio Ganio 4:709:01 4:706:03 4:709:01 4:706:03 ... Barr Barr 4:705:03 4:705:03 Hall Smith Smith Hall 4:708:04 4:709:03 4:709:03 4:708:04 Leaf Level(Key Value) Page 41 Page 51 Page 61 Page 71 Con Con 4:704:01 4:704:01 Smith Smith Jones Jones 4:709:02 4:707:01 4:709:02 4:707:01 Funk Funk 4:706:02 4:706:02 Woods Jones Jones Woods 4:708:03 4:708:03 4:704:03 4:704:03 Martin 4:708:01 Funk Funk 4:704:02 4:704:02 Jones Jones Woods Woods 4:707:03 4:707:03 4:705:02 4:705:02 Pica Owen Owen Mather Pica Mather 4:707:02 4:708:02 4:708:02 4:707:02 4:706:04 4:706:04 Mather 4:706:04 Russo Russo 4:705:01 4:705:01 Owen 4:707:02 Pica 4:708:02 Russo 4:705:01 Heap Page 704 Page 705 Page 706 Page 707 Page 708 Page 709 01 01 ... ... Con Con 01 01 01 01 ... ... ... ... Russo Russo Russo Russo 01 01 ... ... Smith Smith 01 01 ... ... Martin Martin 01 01 ... ... Ganio Ganio 01 ... Akers 02 02 ... ... Funk Funk 02 02 ... ... Woods Woods 02 02 ... ... Owen Owen 02 02 02 02 02 02 ... ... ... ... ... ... Pica Pica Pica Pica Owen Owen 02 02 ... ... Jones Jones 02 ... Funk 03 03 ... ... Woods Woods 03 03 ... ... Barr Barr 03 03 ... ... Jones Jones 03 03 ... ... Jones Jones 03 03 ... ... Hall Hall 03 ... Smith ... ... ... ... ... ... ... ... ... ... ... ... 04 04 ... ... Mather Mather ... ... ... ... ... ... 04 04 ... ... Smith Smith ... ... ... ... ... ... 04 ... Mather ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... sysindexes Non Clustered Index Page 12 - Root Non-Leaf Level SELECT lastname, firstname FROM member WHERE lastname BETWEEN 'Masters' AND 'Russo' Page 37 Page 28 Martin Martin Martin Smith ... Leaf Level(Key Value) Page 41 Page 51 Page 61 Page 71 Martin 4:708:01 Mather 4:706:04 Owen 4:707:02 Pica 4:708:02 Russo 4:705:01 Heap Page 704 Page 705 Page 706 Page 707 Page 708 Page 709 01 ... Akers 02 ... Funk 03 ... Smith 04 ... Mather ... ... ...

Process of Finding Rows in a Clustered Index id indid = 1 root Akers Akers … … Martin Martin Akers Akers Ganio Ganio … … Akers Akers 2334 2334 ... ... Ganio Ganio 7678 7678 ... ... Martin Martin 1234 1234 ... ... Smith Smith 1434 1434 ... ... Barr Barr 5678 5678 ... ... Hall Hall 8078 8078 ... ... Martin Martin 7778 7778 ... ... Smith Smith 5778 5778 ... ... Con Con 2534 2534 ... ... Jones Jones 2434 2434 ... ... Owen Owen 5878 5878 ... ... Smith Smith 7978 7978 ... ... Funk Funk 1334 1334 ... ... Jones Jones 5978 5978 ... ... Pica Pica 7878 7878 ... ... Woods Woods 2234 2234 ... ... ... ... ... ... ... ... ... ... Funk Funk 1534 1534 Jones Jones 2634 2634 Russo Russo 6078 6078 Woods Woods 1634 1634 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Clustered Index Page 140 - Root Owen Owen 5878 5878 ... ... Page 141 Page 145 Martin Martin Martin Smith … Page 100 Page 110 Page 120 Page 130 sysindexes Clustered Index Page 140 - Root SELECT lastname, firstname FROM member WHERE lastname = 'Owen' Page 141 Page 145 Martin Martin Martin Smith … Page 100 Page 110 Page 120 Page 130

Process of Finding Rows in a Clustered Index with a Nonclustered Index id indid = 2 root Aaron Aaron Nonclustered Index on First Name Non-Leaf Level ... ... Jose Jose Aaron Aaron Jose Jose Diane Diane Nina Nina … … … … Aaron Aaron Nicholls Nicholls Jose Jose Lugo Lugo Diane Diane Glimp Glimp Adam Adam Barr Barr Judy Judy Lew Lew Leaf Level(Clustered Key Value) Don Don Hall Hall Amy Amy Strande Strande Mike Mike Mike Mike Nash Nash Nash Nash Douglas Douglas Groncki Groncki … … … … … … … … … … … … Barr Barr Kim Kim Clustered IndexOn Last Name Narp Narp O’Donnell O’Donnell Narp Barr Barr Adam Adam … … Kim Kim Shane Shane … … Narp Narp Sylvie Sylvie … … Cox Cox Brian Brian … … Kocak Kocak Haluk Haluk … … Nash Nash Nash Nash Mike Mike Mike Mike … … … … Dawson Dawson Matt Matt … … LaMee LaMee Brian Brian … … Nicholls Nicholls Aaron Aaron … … … … … … … … … … … … … … … … … … … … sysindexes Nonclustered Index on First Name Non-Leaf Level SELECT lastname, firstname, phone FROM member WHERE firstname = 'Mike' Leaf Level(Clustered Key Value) Clustered IndexOn Last Name Narp

Lesson: Maintaining Index and Heap Structures in SQL Server • What Are Page Splits in an Index? • What Is the Forwarding Pointer in a Heap? • How SQL Server Updates Rows • How SQL Server Deletes Rows

What Are Page Splits in an Index? Ganio Ganio … … Hall Hall … … … … … … Hartono Hartono … … Leaf Level(Key Value) Jones … Jones … Jacob … Jacob … Akers Akers … … Jones … Lang Lang … … Smith Smith … … Barr Barr … … Jones … Martin Martin … … Smith Smith … … Leaf Level(Key Value) Barr Barr … … Martin Martin … … Smith Smith … … Borba Borba … … Martin Martin … … Smith Smith … … Burke Burke … … Morris Morris … … Smith Smith … … … … … … INSERT member (last name) VALUES lastname = 'Jacob' Index Pages Non-Leaf Level Akers … Martin Akers Lang Ganio Smith … Jacob …

What Is the Forwarding Pointer in a Heap? id id indid = 2 indid = 2 root root NonClusteredIndex Non-Leaf Level Akers Akers Akers ... Ganio Ganio Martin Martin Page 37 Page 28 Page 12 - Root ... ... Martin Martin Smith ... Akers Akers 4:706:01 4:706:01 Smith Ganio Ganio Smith 4:709:01 4:706:03 4:709:01 4:706:03 Leaf Level(Key Value) Barr Barr 4:705:03 4:705:03 Hall Smith Hall Smith 4:709:03 4:708:04 4:709:03 4:708:04 Page 41 Page 51 Page 61 Page 71 Owen Owen 4:707:02 4:707:02 Con Con 4:704:01 4:704:01 Jones Smith Jones Smith 4:707:01 4:707:01 4:709:02 4:709:02 Martin 4:708:01 Funk Funk 4:706:02 4:706:02 Woods Jones Jones Woods 4:708:03 4:704:03 4:704:03 4:708:03 Martin 4:706:04 Funk Funk 4:704:02 4:704:02 Jones Woods Woods Jones 4:707:03 4:707:03 4:705:02 4:705:02 Owen 4:707:02 Pica 4:708:02 Russo 4:705:01 Heap Page 704 Page 705 Page 706 Page 707 Page 808 Page 709 01 ... Akers 01 01 ... ... Con Con 01 01 ... ... Russo Russo 01 01 ... ... Smith Smith 04 02 01 02 02 04 01 … … ... … ... ... ... Martin Owen Martin Owen Owen Owen Owen 01 01 ... ... Ganio Ganio 02 ... Funk 02 02 ... ... Funk Funk 02 02 ... ... Woods Woods 02 02 ... ... Owen Owen 02 02 ... ... Pica Pica 02 02 ... ... Jones Jones 03 ... Smith 03 03 ... ... Woods Woods 03 03 ... ... Barr Barr 03 03 ... ... Jones Jones 03 03 ... ... Jones Jones 03 03 ... ... Hall Hall 04 ... Martin ... ... ... ... ... ... ... ... ... ... ... ... 04 04 ... ... Corets Corets 04 04 ... ... Smith Smith ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 05 05 ... ... Nash Nash ... ... ... ... ... ... ... ... ... ... ... ... sysindexes NonClusteredIndex Non-Leaf Level Akers ... UPDATE member SET Address = <something long> WHERE lastname = 'Owen' Martin Martin Page 37 Page 28 Page 12 - Root Martin Martin Smith ... Leaf Level(Key Value) Page 41 Page 51 Page 61 Page 71 Martin 4:708:01 Martin 4:706:04 Owen 4:707:02 Pica 4:708:02 Russo 4:705:01 Heap Page 704 Page 705 Page 706 Page 707 Page 808 Page 709 01 ... Akers 02 ... Funk 03 ... Smith 04 ... Martin ... ... ...

How SQL Server Updates Rows • Generally Does Not Cause a Row to Move • Can Be a Delete Followed by an Insert if: • The update does not fit on a page of a heap • Table has an update trigger • Table is marked for replication • Clustered index key requires the row to be relocated • Batch Updates Touch Each Index Only Once

How SQL Server Deletes Rows • Deletes Causes Ghost Records • SQL Server Reclaims Space • Files Can Shrink Clustered index pages move as a unit Heap records move individually

Lesson: Deciding Which Columns to Index • Data Analysis • Guidelines for Indexing • Guidelines For Choosing the Appropriate Clustered Index • Guidelines For Designing Indexes to Support Queries • Process of Determining Selectivity • Process of Determining Density • Process of Determining Distribution of Data • Best Practices

Data Analysis • Logical and Physical Design • Data Characteristics • How Data Is Used • OLAP vs. OLTP – is the database read or write intensive? • The types of queries performed • The frequency of queries that are typically performed

Guidelines for Indexing • Columns to Index • Primary and foreign keys • Those frequently searched for ranges of key values • Those frequently accessed in sorted order • Those frequently grouped together during aggregation • Columns Not to Index • Those seldom referenced in queries • Those that contain few unique values • Those defined with text, ntext, or image data types

Guidelines For Choosing the Appropriate Clustered Index • Heavily Updated Tables • Create on primary key identity column • Sorting • A clustered index keeps the data pre-sorted • Column Length and Data Type • Limit the number of columns • Reduce the average number of characters • Use the smallest data type possible

Guidelines For Designing Indexes to Support Queries • Using Search Arguments • Writing Good Search Arguments • Specify a WHERE clause in the query • Verify that the WHERE clause limits the number of rows • Verify that an expression exists for every table referenced in the query • Use = , >, <, >=, <=, BETWEEN • Avoid <>, !=, !>,!<, NOT EXISTS, NOT IN • Avoid using leading wildcards

Process of Determining Selectivity Number of rows meeting criteriaTotal number of rows in table 100010000 = 10% = Number of rows meeting criteriaTotal number of rows in table 900010000 = 90% = High selectivity member_no last_name first_name 1 Randall Linda 2 Flood Kathie . SELECT *FROM memberWHERE member_no > 9000 . . 10000 Anderson Thomas Low selectivity member_no last_name first_name 1 Randall Linda 2 Flood Kathie . SELECT *FROM memberWHERE member_no < 9001 . . 10000 Anderson Thomas

Process of Determining Density last_name first_name Smith Ben . . . Smith Ronaldo Smith Samantha . . . Owen Laura . . . High Density SELECT *FROM memberWHERE last_name = ‘Smith’ Low Density SELECT *FROM memberWHERE last_name = 'Owen'

Process of Determining Distribution of Data Standard Distribution of Values Number ofLast Names A - E F - J K - O P - U V - Z Last Name Even Distribution of Values Number ofLast Names A - B C - F G - K L - N O - Z Last Name

Best Practices Create Indexes on Columns That Join Tables ü Use Indexes to Enforce Uniqueness ü Drop Unused Indexes ü Avoid Long Clustering Keys ü Consider Using a Clustered Index to Support Sorting and Range Searches ü Create Indexes That Support Search Arguments ü

Lab A: Determining the Indexes of a Table • Exercise 1: Identifying Indexes Using sp_helpindex • Exercise 2: Viewing Entries in the sysindexes Table

Module 5: Planning Indexes