How SQL Server Indexes Work

How SQL Server Indexes Work Sharon F. Dooley sharond@voicenet.com

SQL Server Indexes • SQL Server indexes are based on B-trees • Special records called nodes that allow keyed access to data • Two kinds of nodes are special • Root • Leaf Root node Intermediate node Leafnode Datapages

SQL Server B-Tree Rules • Root and intermediate nodes point only to other nodes • Only leaf nodes point to data • The number of nodes between the root and any leaf is the same for all leaves • A node always contains between K and K/2 branches, where K is the branching factor • Branching factor is the number of keys in the node • B-trees are always sorted • The tree will be maintained during insertion, deletion, and updating so that these rules are met • When records are inserted or updated, nodes may split • When records are deleted, nodes may be collapsed

What Is a Node? • A page that contains key and pointer pairs

Splitting a B-Tree Node Root (Level 0) Node (Level 1) Leaf (Level 2) DB

Let’s Add Alice • Step 1: Split the leaf node

Adding Alice • Step 2: Split the next level up Leaf DB

Adding Alice(continued) • Split the root Leaf DB

Adding Alice(continued) • When the root splits, the tree grows another level Root (Level 0) Node(Level 1) Node(Level 2) Leaf(Level 3) DB

Adams, Carol Ally, Kent Baccus, Mary David, Sue Dulles, Kelly Edom, Mike Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry ... Page splits cause fragmentation • Two types of fragmentation • Data pages in a clustered table • Index pages in all indexes • Fragmentation happens because these pages must be kept in order • Data page fragmentation happens when a new record must be added to a page that is full • Consider an Employee table with a clustered index on LastName, FirstName • A new employee, Peter Dent, is hired Extent

Data Page Fragmentation Extent Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry ... Extent Dulles, KellyEdom, Mike ...

036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-33-9874, pointer 052-87-8373, pointer 116-11-9987, pointer 116-33-9874, pointer 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer ... Index Fragmentation • Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full • Consider an Employee table with a nonclustered index on Social Security Number • Employee 048-12-9875 is added Extent

Index Fragmentation (continued) 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-12-9875, pointer 116-11-9987, pointer 116-33-9874, pointer 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer Extent ... 048-33-9874, pointer 052-87-8373, pointer Extent ...

Studying Fragmentation in SQL Server 2000 • To determine if there is fragmentation • In a clustered table or a nonclustered indexDBCC SHOWCONTIG[([table_id | table_name | view_id | view_name [, index_id | index_name])] DBCC SHOWCONTIG scanning 'Employees' table... Table: 'Employees' (1977058079); index ID: 1, database ID: 7 TABLE level scan performed. - Pages Scanned................................: 707 - Extents Scanned..............................: 90 - Extent Switches..............................: 572 - Avg. Pages per Extent........................: 7.9 - Scan Density [Best Count:Actual Count].......: 15.53% [89:573] - Logical Scan Fragmentation ..................: 39.18% - Extent Scan Fragmentation ...................: 58.89% - Avg. Bytes Free per Page.....................: 4338.9 - Avg. Page Density (full).....................: 46.39% DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Studying Fragmentation in SQL Server 2000(continued) • Unless the table or index spans multiple files • Extent Switches and Extents Scanned should be approximately equal • Scan Density should be close to 100 percent • Avg. Page Density should be high and Avg. Bytes Free Per Page should be low • Logical Scan Fragmentation and Extent Scan Fragmentation should be as close to 0 as possible • Clearly the Employees table is terribly fragmented!

Studying Fragmentation in SQL Server 2005 and 2008 SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), {indexid | NULL}, {partitionnumber | NULL}, {'LIMITED'| 'SAMPLED' | 'DETAILED' | NULL}) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id • If NULL is supplied for the last argument, LIMITED is assumed • The avg_fragmentation_in_percent should be as close to 0 as possible

Studying Fragmentation in SQL Server 2005 and 2008 (continued) SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('bigwind'), object_id('Employees'), null, null, null) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id WHERE alloc_unit_type_desc = 'IN_ROW_DATA' • Results

Repairing Fragmentation • Repair index fragmentation by rebuilding index • Rebuilding clustered index repairs table fragmentation • DBCC DBREINDEX DBCC DBREINDEX (tablename [, indexname [, fillfactor]]) • Can reorganize indexes that implement primary key and unique constraints • CREATE INDEX • DROP_EXISTINGcauses SQL Server to create and drop the index in a single step • Faster than dropping with the DROP INDEX command and then re-creating • ALTER TABLE … ADD CONSTRAINT … PRIMARY KEY or UNIQUE

Repairing Fragmentation(continued) • DBCC INDEXDEFRAG does not • Lock the index • Do as thorough job as the other methods • Allow specification of a fill factor • Uses the fill factor from the last CREATE INDEX for this index DBCC INDEXDEFRAG( { database_name | database_id | 0 } , { table_name | table_id} , { index_name | index_id } ) • ALTER INDEX index_name ON table_name REORGANIZE • Same as DBCC INDEXDEFRAG • ALTER INDEX index_name ON table_name REBUILD • Allows concurrent access if you add WITH (ONLINE = ON) to the ALTER INDEX command • Uses the version store in tempdb • Same as DBCC DBREINDEX 2005 2008

SQL Server Indexes • SQL Server indexes come in two flavors • Clustered indexes • Database rows are in order on the index key • The data pages are the leaf nodes of the index • Nonclustered indexes • Leaf level is in index order but the data is not • Leaf nodes contain pointers to rows • One clustered index per table • Choose wisely • Should always have a clustered index • Allows reorganization of the data pages • 249 nonclustered indexes per table

Clustered Index Root Database and leaf node

Nonclustered Index Root Leaf node Database

Clustered and Nonclustered Indexes Interact • Clustered indexes are always unique • If you don’t specify unique when creating them, SQL Server may add a “uniqueifier” to the index key • Only used when there actually is a duplicate • Adds 4 bytes to the key • The clustering key is used in nonclustered indexes • This allows SQL Server to go directly to the record from the nonclustered index • If there is no clustered index, a record identifier will be used instead Leaf node of a clustered index on EmployeeID Leaf node of a nonclustered index on LastName

Clustered and Nonclustered Indexes Interact(continued) • Another reason to keep the clustering key small! • Consider the following query: SELECT LastName, FirstNameFROM EmployeeWHERE LastName = 'Douglas' • When SQL Server uses the nonclustered index, it • Traverses the nonclustered index until it finds the desired key • Picks up the associated clustering key • Traverses the clustered index to find the data

Heaps and Chains • When you place a clustered index on a table, the pages are chained together in a doubly linked list • SQL Server can follow the pointers to move from page to page • When there is no clustered index, the table is called a heap • Data is located • Through nonclustered indexes • By scanning all the pages in the table

Indexes and Inserts • When there is no clustered index on a table, SQL Server uses the Page Free Space page to find a page with space for the new record • Inserts into tables with a clustered index can cause page splits or hotspots • When a particular part of the database is particularly popular, it is called a hotspot • Hotspots create contention problems • Clustered indexes on keys that arrive in random order cause page splits • Clustered indexes on keys that arrive in index order create hotspots • All inserts are again at the end of the table • Identities • Dates • However, there will be no page splits or collapses

Indexes and Updates • When data is modified, indexes may have to be modified as well • Changing data in a table with no indexes • Data will be changed in place unless the update means that the row will no longer fit on the page • If the row won’t fit, it will simply be moved to a new page • Changing a clustering key column • The row will be deleted from its original location • It will be inserted into the new location • All nonclustered indexes must be maintained • Exception: • A change that doesn’t affect the index order • Changing “Thompson” to “Thompsen” will be done in place • Avoid clustered indexes on columns that frequently change Thomas Thompson Tyne

Indexes and Updates(continued) • Changing any non-key column in a table with only nonclustered indexes • Data will be changed in place unless the row will no longer fit • If the row will no longer fit • It will be moved to another page • A forwarding pointer will be left on the original page that points to the new location • Index pointers don’t need to be updated • An additional I/O will now be required • There will never be more than one forwarding pointer, no matter how many times a row moves • To see whether updates have produced forwarding pointers, use DBCC SHOWCONTIG('tablename') WITH TABLERESULTS SELECT forwarded_record_countFROMsys.dm_db_index_physical_stats (db_id('databasename'),object_id('tablename'), NULL, NULL, NULL) 2000 2005 2008

Indexes and Deletes • Deleting from a heap • Row is physically deleted • Remaining rows on page below the deleted record are not moved up at the time the delete happens • Page will be compressed when space is needed for another row on the page • Deleting from a leaf node • Data pages of clustered index • Leaf nodes of nonclustered index • Records may not be physically deleted at the time the delete is issued • Records may be marked as ghost records • Used by lock manager • Not retrievable by users • Special SQL Server process cleans up the ghost records • Won’t clean up records that are part of an active transaction • Doesn’t compress page

Indexes and Deletes(continued) • Deleting from a nonleaf node • No ghost records • Page is not compressed • When rows are deleted, both nonclustered and clustered indexes must be maintained • When the last row is deleted from a page (index or data), the page is deallocated and returned to the free space pool • Unless it is the only page in the table • A table always has at least one page, even if it is empty

What Are Index Statistics? • Metric used by the optimizer in determining whether or not an index is useful for a particular query • Stored in • An image column named statblob in the sysindexestable • An internal and invisible table • Essentially a histogram • Statistics are kept for 200 steps 2000 2005 2008

Selectivity • The statistics allow the optimizer to determine the selectivity of an index • A unique, single-column index always has a selectivity of 1 • One index entry points to exactly one row • Another term for this is density • Density is the inverse of selectivity • Density values range from 0 to 1 • A selective index has a density of 0.10 or less • A unique, single-column index always has a density of 0.0 • When the index is composite, it becomes a little more complicated • SQL Server maintains detailed statistics only on the leftmost column • It does compute density for each column • Assume there is an index on (col1, col2, col3) • Density is computed for • Col1 • Col1 + Col2 • Col1 + Col2 + Col3

Exploring Statistics • To see the index statistics, use • DBCC SHOW_STATISTICS ('tablename', {'indexname' | 'statisticsname'}) • DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx') • Interpreting the step output

Exploring Index Statistics(continued) Statistics for INDEX 'EmployeeName_IDX'. Updated Rows Rows Sampled Steps Density Avg key length -------------------- ------- ------------- ------ ------------ ------------ Jan 27 2002 7:00PM 10009 10009 200 1.3958309E-4 28.475372 All density Average Length Columns ------------ -------------- -------- 1.4271443E-4 13.252672 LastName 9.9940036E-5 24.475372 LastName, FirstName 9.9910081E-5 28.475372 LastName, FirstName, EmployeeID RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS ------------ ---------- ------- -------------------- -------------- Aaby 0.0 2.0 0 0.0 Abrahamson 59.0 2.0 43 1.3409091 . . . Zuran 13.0 1.0 10 1.3 Zvonek 0.0 1.0 0 0.0 The report for SQL Server 2005 and later is slightly different at the beginning Distribution steps

Statistics Maintenance • By default, SQL Server will automatically maintain the statistics • Index statistics are computed (or recomputed) when the index is created or rebuilt • SQL Server keeps track of the updates to a table • Each INSERT, UPDATE, or DELETE statement updates a counter in sysindexes named rowmodctr • Note that TRUNCATE TABLE does not modify this counter • Whenever the statistics are recomputed, the counter is set back to zero • When you issue a query, the optimizer checks rowmodctr to see whether the statistics are up to date • If they are not, the statistics will be updated

Statistics Maintenance(continued) • Note that this may not always happen at the best time in a production system • Can turn off automatic update • Can manually update • In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC database option • Example: • Assume that a table has 1,000 rows • The threshold would be 500 + (.20 * 1000) • You would expect to see the statistics automatically updated after about 700 modifications

Estimating Page Accesses • No index • Number of data pages in the table • Equality query using a unique index • Nonclustered index • Number of index levels + 1 (if there is no clustered index) • Number of nonclustered index levels + the number of levels in the clustered index • Clustered index • Number of index levels

Estimating Page Accesses for a Clustered Index • Number of levels + number of data pages • Number of data pages = number of qualifying rows / rows per page Root Database and leaf node

Estimating Page Accesses for a Nonclustered Index • Number of levels + number of qualifying leaf pages + number of rows ornumber of levels + number of qualifying leaf pages + (number of rows * number of clustered index levels) • Number of qualifying leaf pages= number of qualifying rows / rows per page • Assumes every row is on a different page Root Leaf node Database

Covering Indexes • When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index • With a covering index, there is no need to access the actual data pages • Only the leaf nodes of the nonclustered index are accessed • Because the leaf node of a clustered index is the data itself, a clustered index covers all queries Leaf node of a nonclustered index on LastName, FirstName, Birthdate The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.

Covering Indexes(continued) • This index covers the following queries: SELECT EmployeeID, LastName, FirstName, Birthdate FROM Employees WHERE Birthdate >= '1/12/1941' SELECT LastName FROM Employees WHERE EmployeeID = 7 SELECT LastName, FirstName FROM Employees WHERE LastName BETWEEN 'A' AND 'C' • Remember that the number of accesses for a nonclustered index is the number of levels + the number of qualifying leaf pages + either the number of rows or (the number of rows * the number of levels in the clustered index) • A covering index eliminates the “number of rows” term from the equation • The optimizer is highly likely to use a nonclustered index because of this

Non-Key Index Columns • SQL Server 2005 and later allow you to include columns in a non-clusteredindex that are not part of the key • Allows the index to cover more queries • Included columns only appear in the leaf level of the index • Up to 1,023 additional columns • Can include data types that cannot be key columns • Except text, ntext, and image data types • Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_nameON <object> (column [ ASC | DESC ] [ ,...n ] )[ INCLUDE (column_name [ ,...n ] ) ] • Example CREATE NONCLUSTERED INDEX NameRegion_IDXON Employees(LastName)INCLUDE (Region)

How SQL Server Indexes Work

How SQL Server Indexes Work

Presentation Transcript

SQL Server

How to Recover SQL Server Password

How to repair SQL Server database?

SQL Server 2008 SQL Server Management Studio

SQL Server Indexes

SQL Server

Introducing SQL Server 2012 Columnstore Indexes

SQL Server 2012: Columnstore Indexes

SQL SERVER

SQL/Server

SQL SERVER

SQL Server

SQL Server

Sea Ice

Sea Ice