1 / 31

Andras Belokosztolszki Red Gate Software

Andras Belokosztolszki Red Gate Software. SQL Server Storage Engine. Andras.Belokosztolszki@red-gate.com. Software architect at Red Gate Software Responsible for SQL tools: SQL Compare, SQL Data Compare, SQL Packager SQL Log Rescue SQL Refactor … many others

vinnie
Download Presentation

Andras Belokosztolszki Red Gate Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andras BelokosztolszkiRed Gate Software SQL Server Storage Engine

  2. Andras.Belokosztolszki@red-gate.com • Software architect at Red Gate Software • Responsible for SQL tools: • SQL Compare, SQL Data Compare, SQL Packager • SQL Log Rescue • SQL Refactor • … many others • Events (NxtGenUG, VBUG, SQL Bits, PASS, many other user groups) • SQL Server Central • Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx • Articles: http://www.simple-talk.com/author/andr%c3%a1s-belokosztolszki/

  3. Agenda • Physical storage • Pages, rows, data types, index structure • Data and schema modifications • What happens when you change the schema • What happens when a row is inserted, delted, etc • SQL Server 2008 features • Compression and file streams

  4. Database files • Primary database file (*.mdf) • Secondary database files (*.ndf) • Optional, can be more than one • Log files (not covered) Database Log Primary Secondary Log Secondary

  5. Pages • Data files are dividied up into 8KB pages • All information is stored in pages (data, schema, database information, space allocation(GAM, SGAM, IAM), dlls) • Identified by fileId:PageId (2+4 bytes) • 8 pages = 1 extent • Most important for us is the data page 1:0 1:1 1:2 1:3 1:4 1:5 1:6 1:7 1:8 1:9 1:A 1:B 1:C 1:D 1:E 1:F

  6. Structure of a data page • Page header (96 bytes) • Data rows • Offset array • DBCC PAGE • (db,file,page,options) • 2 – raw, 3 – row details • Trace flag 3604 Page header Demo

  7. Data row format • Fixed length data will always use its allocated space (even when it is null) • Must fit a page (max 8060 bytes) • Some items can overflow: Overflow space StatA (1) StatB (1) Null offset (2) Fixed Length Data Column Count (2) Null bitmap Ceiling(ColCnt/8) Var-Len Column Count (2) Var. Offsets Var-Len Data Demo

  8. Data types • See sys.types • Fixed length (some can be adjusted (time, decimal, char(),…) • Always consumes this space • Variable length (varchar, varbinary, …) • Bit (packed) • SqlVariant • Binary large objects (ntext, varchar(max), …) • After a certain size stored on other pages

  9. From tables to pages Heap/Index Partition Allocation Unit 1 N 1 3 sys.indexes sys.partitions sys.allocation_units • In row data • LOB • Row overflow sys.partitions sp JOIN sys.allocation_units au ON sp.partition_id = au.container_id

  10. Clustered index Root level Interior levels Level 0 Leaf level Row Data

  11. Clustered index • The full row record is at the leaf level • Consequently there can be only one clustered index • In the intermediary and root levels a clustered key is stored, for the first entries of the next level pages • If the key row length is e.g. 15 bytes, an intermediary page can store up to (8096/15 =) 539 rows (reference 539 pages) • Exact space usage in sys.allocation_units • Pages are double linked

  12. Nonclustered index Root level/ Interior levels Leaf level Row Data

  13. Index space usage • See sys.allocation_units • Max 900 bytes per entry! • Index entry contains the key columns, and • Index key columns • Record locator (nonclusered) • Row ID or clustering key (not stored redundantly) • Down pointer (for non leaf pages) Fixed Length Data StatA (1) Column Count (2) Null bitmap Ceiling(ColCnt/8) Var-Len Column Count (2) Var. Offsets Var-Len Data

  14. Included columns • Motivation: • When using a clustered index on heap, an item is looked up, then one more page read to retrieve extra data • When using a clustered index on a B-tree, the clustered index structure is also traversed • You can include extra columns in a non-clustered index • These will not be used to look up rows in the table • Increases the coverage of an index • Increases the size of an index record -> the total size • Extra maintenance

  15. Summary of static data storage • Everything is stored on pages • Rows have fixed and variable length portions • Differences between certain data types and their limitations • Index structures • Size estimates for indexes, page estimates for queries • The fewer pages we load into memory, the better?

  16. Modifications to the stored information

  17. Schema modification • What can happen: • No rows are modified, only meta information • All rows are examined • E.g. changing nullability • Int to smallint (wasted space!) • All rows are rebuilt • We may end up wasting a lot of valueable space! How can we reclaim the space? Demo

  18. Modifications on heaps • Insert: added where there is space • Delete: removed or marked as ghost • Update: Since indexes refer to file:page:slot if a row no longer fits on a page, it cannot easily be moved -> it is moved, but a reference to it is left (forwarded record)

  19. Modifications on clustered tables • Insert: Since the rows are ordered, if there is not enough space on a table, the table is split into two (can happen many times) • Update: • like inserts, if the new row is too big to fit • Changes to clustering columns = delete+insert • Delete: the row is marked as ghost or is deleted

  20. Phil Factor and Pad Index • Pad Index • Intermediary pages only • Specified as percentage • Fill Factor • Leaf pages only • Specified as percentage Only when index is created or rebuilt. The free space is NOT maintained. (see later index reorganization and rebuilding)

  21. Fragmentation • sys.dm_db_index_physical_stats() • Logical fragmentation: next leaf page for index page is not the next page that is allocated to the index • Extent fragmentation: extents are not contiguous • Page fill

  22. Handling fragmentation • Drop and create the clustered index • Index is offline • ALTER INDEX REORGANIZE • This is the replacement for DBCC INDEXDEFRAG • Reorganizes index pages(and compacts pages and LOBs) (NO new pages) • ALTER INDEX REBUILD • This is the replacement for DBCC DBREINDEX • Basically drops and recreates the index

  23. Row compression • Introduced in SQL Server 2008 • Stores fixed length data as variable length • E.g. Integer – can use 1,2,3,4 bytes + bits instead of 4 bytes + bit • Available in Enterprise edition CREATE TABLE RowCompressedTable (…) WITH (DATA_COMPRESSION = Row);

  24. Compressed row • CD Array: 0 = null, 1 – 9 number of bytes, 10 – long • Self contained Column Count (1/2) CD Array (4b/col) StatA (1) Short data Null bitmap Ceiling(ColCnt/8) Var-Len Column Count (2) Var. Offsets Var-Len Data WITH (data_compression = row)

  25. Page compression • Row compression • Prefix compression • Dictionary compression • When table created, there is no compression • Row compression kicks in when otherwise a page split would occur • When table with data converted it is rebuilt • sp_estimate_data_compression_savings

  26. Prefix compression Page header Page header aaabcc aaaacc abcd aaabb aaaab abcd 4b 4b [] aaabcc bbbb abcd [] 0bbbb [] aaaccc aaaacc bbbb 3ccc [] 0bbbb

  27. Dictionary compression Page header Page header aaabcc aaaacc abcd aaabcc aaaacc abcd 4b 0bbbb 4b 4b [] 0 0 [] [] 0bbbb [] [] 1 [] 3ccc [] 0bbbb 3ccc [] 1

  28. BLOB Structure • B-tree structure • Many pages need to be looked up • Smaller BLOBs can be inlined sp_tableoption <tablename>, ‘text in row’, <length> Data row Text Pointer Root entry Intermediate node Intermediate node Data fragment Data fragment Data fragment Data fragment

  29. Filestreams • When BLOBs are not enough: • Large items (over 1Mb) • Very fast read is needed • 2GB++ • Can use T-SQL to access • File stream access vie Win32 API

  30. Summary • Static data storage • Table and index rows • The way these are linked together • What happens during schema and data modifications • Lessons to take away • Minimize the number of pages you need to read or write • Rebuild your tables and use fill factor, and rebuild indexes durng off peak hours! • Use the specialized data types and storage options

  31. Questions • Thanks to SQL Bits & Sponsors • Blog: http://www.simple-talk.com/community/blogs/andras/default.aspx • Email: Andras.Belokosztolszki (at) red-gate.com

More Related