Andras Belokosztolszki Red Gate Software

Andras BelokosztolszkiRed Gate Software SQL Server Storage Engine

Andras.Belokosztolszki@red-gate.com • Software architect at Red Gate Software • Responsible for SQL tools: • SQL Compare, SQL Data Compare, SQL Packager • SQL Log Rescue • SQL Refactor • … many others • Events (NxtGenUG, VBUG, SQL Bits, PASS, many other user groups) • SQL Server Central • Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx • Articles: http://www.simple-talk.com/author/andr%c3%a1s-belokosztolszki/

Agenda • Physical storage • Pages, rows, data types, index structure • Data and schema modifications • What happens when you change the schema • What happens when a row is inserted, delted, etc • SQL Server 2008 features • Compression and file streams

Database files • Primary database file (*.mdf) • Secondary database files (*.ndf) • Optional, can be more than one • Log files (not covered) Database Log Primary Secondary Log Secondary

Pages • Data files are dividied up into 8KB pages • All information is stored in pages (data, schema, database information, space allocation(GAM, SGAM, IAM), dlls) • Identified by fileId:PageId (2+4 bytes) • 8 pages = 1 extent • Most important for us is the data page 1:0 1:1 1:2 1:3 1:4 1:5 1:6 1:7 1:8 1:9 1:A 1:B 1:C 1:D 1:E 1:F

Structure of a data page • Page header (96 bytes) • Data rows • Offset array • DBCC PAGE • (db,file,page,options) • 2 – raw, 3 – row details • Trace flag 3604 Page header Demo

Data row format • Fixed length data will always use its allocated space (even when it is null) • Must fit a page (max 8060 bytes) • Some items can overflow: Overflow space StatA (1) StatB (1) Null offset (2) Fixed Length Data Column Count (2) Null bitmap Ceiling(ColCnt/8) Var-Len Column Count (2) Var. Offsets Var-Len Data Demo

Data types • See sys.types • Fixed length (some can be adjusted (time, decimal, char(),…) • Always consumes this space • Variable length (varchar, varbinary, …) • Bit (packed) • SqlVariant • Binary large objects (ntext, varchar(max), …) • After a certain size stored on other pages

From tables to pages Heap/Index Partition Allocation Unit 1 N 1 3 sys.indexes sys.partitions sys.allocation_units • In row data • LOB • Row overflow sys.partitions sp JOIN sys.allocation_units au ON sp.partition_id = au.container_id

Clustered index Root level Interior levels Level 0 Leaf level Row Data

Clustered index • The full row record is at the leaf level • Consequently there can be only one clustered index • In the intermediary and root levels a clustered key is stored, for the first entries of the next level pages • If the key row length is e.g. 15 bytes, an intermediary page can store up to (8096/15 =) 539 rows (reference 539 pages) • Exact space usage in sys.allocation_units • Pages are double linked

Nonclustered index Root level/ Interior levels Leaf level Row Data

Index space usage • See sys.allocation_units • Max 900 bytes per entry! • Index entry contains the key columns, and • Index key columns • Record locator (nonclusered) • Row ID or clustering key (not stored redundantly) • Down pointer (for non leaf pages) Fixed Length Data StatA (1) Column Count (2) Null bitmap Ceiling(ColCnt/8) Var-Len Column Count (2) Var. Offsets Var-Len Data

Included columns • Motivation: • When using a clustered index on heap, an item is looked up, then one more page read to retrieve extra data • When using a clustered index on a B-tree, the clustered index structure is also traversed • You can include extra columns in a non-clustered index • These will not be used to look up rows in the table • Increases the coverage of an index • Increases the size of an index record -> the total size • Extra maintenance

Summary of static data storage • Everything is stored on pages • Rows have fixed and variable length portions • Differences between certain data types and their limitations • Index structures • Size estimates for indexes, page estimates for queries • The fewer pages we load into memory, the better?

Modifications to the stored information

Schema modification • What can happen: • No rows are modified, only meta information • All rows are examined • E.g. changing nullability • Int to smallint (wasted space!) • All rows are rebuilt • We may end up wasting a lot of valueable space! How can we reclaim the space? Demo

Modifications on heaps • Insert: added where there is space • Delete: removed or marked as ghost • Update: Since indexes refer to file:page:slot if a row no longer fits on a page, it cannot easily be moved -> it is moved, but a reference to it is left (forwarded record)

Modifications on clustered tables • Insert: Since the rows are ordered, if there is not enough space on a table, the table is split into two (can happen many times) • Update: • like inserts, if the new row is too big to fit • Changes to clustering columns = delete+insert • Delete: the row is marked as ghost or is deleted

Phil Factor and Pad Index • Pad Index • Intermediary pages only • Specified as percentage • Fill Factor • Leaf pages only • Specified as percentage Only when index is created or rebuilt. The free space is NOT maintained. (see later index reorganization and rebuilding)

Fragmentation • sys.dm_db_index_physical_stats() • Logical fragmentation: next leaf page for index page is not the next page that is allocated to the index • Extent fragmentation: extents are not contiguous • Page fill

Handling fragmentation • Drop and create the clustered index • Index is offline • ALTER INDEX REORGANIZE • This is the replacement for DBCC INDEXDEFRAG • Reorganizes index pages(and compacts pages and LOBs) (NO new pages) • ALTER INDEX REBUILD • This is the replacement for DBCC DBREINDEX • Basically drops and recreates the index

Row compression • Introduced in SQL Server 2008 • Stores fixed length data as variable length • E.g. Integer – can use 1,2,3,4 bytes + bits instead of 4 bytes + bit • Available in Enterprise edition CREATE TABLE RowCompressedTable (…) WITH (DATA_COMPRESSION = Row);

Compressed row • CD Array: 0 = null, 1 – 9 number of bytes, 10 – long • Self contained Column Count (1/2) CD Array (4b/col) StatA (1) Short data Null bitmap Ceiling(ColCnt/8) Var-Len Column Count (2) Var. Offsets Var-Len Data WITH (data_compression = row)

Page compression • Row compression • Prefix compression • Dictionary compression • When table created, there is no compression • Row compression kicks in when otherwise a page split would occur • When table with data converted it is rebuilt • sp_estimate_data_compression_savings

Prefix compression Page header Page header aaabcc aaaacc abcd aaabb aaaab abcd 4b 4b [] aaabcc bbbb abcd [] 0bbbb [] aaaccc aaaacc bbbb 3ccc [] 0bbbb

Dictionary compression Page header Page header aaabcc aaaacc abcd aaabcc aaaacc abcd 4b 0bbbb 4b 4b [] 0 0 [] [] 0bbbb [] [] 1 [] 3ccc [] 0bbbb 3ccc [] 1

BLOB Structure • B-tree structure • Many pages need to be looked up • Smaller BLOBs can be inlined sp_tableoption <tablename>, ‘text in row’, <length> Data row Text Pointer Root entry Intermediate node Intermediate node Data fragment Data fragment Data fragment Data fragment

Filestreams • When BLOBs are not enough: • Large items (over 1Mb) • Very fast read is needed • 2GB++ • Can use T-SQL to access • File stream access vie Win32 API

Summary • Static data storage • Table and index rows • The way these are linked together • What happens during schema and data modifications • Lessons to take away • Minimize the number of pages you need to read or write • Rebuild your tables and use fill factor, and rebuild indexes durng off peak hours! • Use the specialized data types and storage options

Questions • Thanks to SQL Bits & Sponsors • Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx • Email: Andras.Belokosztolszki (at) red-gate.com

Andras Belokosztolszki Red Gate Software