andras belokosztolszki red gate software n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Andras Belokosztolszki Red Gate Software PowerPoint Presentation
Download Presentation
Andras Belokosztolszki Red Gate Software

Loading in 2 Seconds...

play fullscreen
1 / 31

Andras Belokosztolszki Red Gate Software - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

Andras Belokosztolszki Red Gate Software. SQL Server Storage Engine. Andras.Belokosztolszki@red-gate.com. Software architect at Red Gate Software Responsible for SQL tools: SQL Compare, SQL Data Compare, SQL Packager SQL Log Rescue SQL Refactor … many others

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Andras Belokosztolszki Red Gate Software' - ivory-bauer


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
andras belokosztolszki@red gate com
Andras.Belokosztolszki@red-gate.com
  • Software architect at Red Gate Software
  • Responsible for SQL tools:
    • SQL Compare, SQL Data Compare, SQL Packager
    • SQL Log Rescue
    • SQL Refactor
    • … many others
  • Events (NxtGenUG, VBUG, SQL Bits, PASS, many other user groups)
  • SQL Server Central
  • Blog:

http://www.simple-talk.com/community/blogs/andras/default.aspx

  • Articles:

http://www.simple-talk.com/author/andr%c3%a1s-belokosztolszki/

agenda
Agenda
  • Physical storage
    • Pages, rows, data types, index structure
  • Data and schema modifications
    • What happens when you change the schema
    • What happens when a row is inserted, delted, etc
  • SQL Server 2008 features
    • Compression and file streams
database files
Database files
  • Primary database file (*.mdf)
  • Secondary database files (*.ndf)
    • Optional, can be more than one
  • Log files (not covered)

Database

Log

Primary

Secondary

Log

Secondary

pages
Pages
  • Data files are dividied up into 8KB pages
  • All information is stored in pages (data, schema, database information, space allocation(GAM, SGAM, IAM), dlls)
  • Identified by fileId:PageId (2+4 bytes)
  • 8 pages = 1 extent
  • Most important for us is the data page

1:0

1:1

1:2

1:3

1:4

1:5

1:6

1:7

1:8

1:9

1:A

1:B

1:C

1:D

1:E

1:F

structure of a data page
Structure of a data page
  • Page header (96 bytes)
  • Data rows
  • Offset array
  • DBCC PAGE
    • (db,file,page,options)
      • 2 – raw, 3 – row details
    • Trace flag 3604

Page header

Demo

data row format
Data row format
  • Fixed length data will always use its allocated space (even when it is null)
  • Must fit a page (max 8060 bytes)
    • Some items can overflow: Overflow space

StatA

(1)

StatB

(1)

Null offset

(2)

Fixed Length Data

Column Count

(2)

Null bitmap

Ceiling(ColCnt/8)

Var-Len Column Count

(2)

Var. Offsets

Var-Len Data

Demo

data types
Data types
  • See sys.types
    • Fixed length (some can be adjusted (time, decimal, char(),…)
      • Always consumes this space
    • Variable length (varchar, varbinary, …)
    • Bit (packed)
    • SqlVariant
    • Binary large objects (ntext, varchar(max), …)
      • After a certain size stored on other pages
from tables to pages
From tables to pages

Heap/Index

Partition

Allocation Unit

1

N

1

3

sys.indexes

sys.partitions

sys.allocation_units

  • In row data
  • LOB
  • Row overflow

sys.partitions sp

JOIN sys.allocation_units au

ON sp.partition_id = au.container_id

clustered index
Clustered index

Root level

Interior levels

Level 0

Leaf level

Row Data

clustered index1
Clustered index
  • The full row record is at the leaf level
    • Consequently there can be only one clustered index
  • In the intermediary and root levels a clustered key is stored, for the first entries of the next level pages
  • If the key row length is e.g. 15 bytes, an intermediary page can store up to (8096/15 =) 539 rows (reference 539 pages)
  • Exact space usage in sys.allocation_units
  • Pages are double linked
nonclustered index
Nonclustered index

Root level/

Interior levels

Leaf level

Row Data

index space usage
Index space usage
  • See sys.allocation_units
  • Max 900 bytes per entry!
  • Index entry contains the key columns, and
    • Index key columns
    • Record locator (nonclusered)
      • Row ID or clustering key (not stored redundantly)
    • Down pointer (for non leaf pages)

Fixed Length Data

StatA

(1)

Column Count

(2)

Null bitmap

Ceiling(ColCnt/8)

Var-Len Column Count

(2)

Var. Offsets

Var-Len Data

included columns
Included columns
  • Motivation:
    • When using a clustered index on heap, an item is looked up, then one more page read to retrieve extra data
    • When using a clustered index on a B-tree, the clustered index structure is also traversed
  • You can include extra columns in a non-clustered index
  • These will not be used to look up rows in the table
  • Increases the coverage of an index
  • Increases the size of an index record -> the total size
  • Extra maintenance
summary of static data storage
Summary of static data storage
  • Everything is stored on pages
  • Rows have fixed and variable length portions
    • Differences between certain data types and their limitations
  • Index structures
    • Size estimates for indexes, page estimates for queries
  • The fewer pages we load into memory, the better?
schema modification
Schema modification
  • What can happen:
    • No rows are modified, only meta information
    • All rows are examined
      • E.g. changing nullability
      • Int to smallint (wasted space!)
    • All rows are rebuilt
  • We may end up wasting a lot of valueable space! How can we reclaim the space?

Demo

modifications on heaps
Modifications on heaps
  • Insert: added where there is space
  • Delete: removed or marked as ghost
  • Update: Since indexes refer to file:page:slot if a row no longer fits on a page, it cannot easily be moved -> it is moved, but a reference to it is left (forwarded record)
modifications on clustered tables
Modifications on clustered tables
  • Insert: Since the rows are ordered, if there is not enough space on a table, the table is split into two (can happen many times)
  • Update:
    • like inserts, if the new row is too big to fit
    • Changes to clustering columns = delete+insert
  • Delete: the row is marked as ghost or is deleted
phil factor and pad index
Phil Factor and Pad Index
  • Pad Index
  • Intermediary pages only
  • Specified as percentage
  • Fill Factor
  • Leaf pages only
  • Specified as percentage

Only when index is created or rebuilt. The free space is NOT maintained. (see later index reorganization and rebuilding)

fragmentation
Fragmentation
  • sys.dm_db_index_physical_stats()
  • Logical fragmentation: next leaf page for index page is not the next page that is allocated to the index
  • Extent fragmentation: extents are not contiguous
  • Page fill
handling fragmentation
Handling fragmentation
  • Drop and create the clustered index
    • Index is offline
  • ALTER INDEX REORGANIZE
    • This is the replacement for DBCC INDEXDEFRAG
    • Reorganizes index pages(and compacts pages and LOBs) (NO new pages)
  • ALTER INDEX REBUILD
    • This is the replacement for DBCC DBREINDEX
    • Basically drops and recreates the index
row compression
Row compression
  • Introduced in SQL Server 2008
  • Stores fixed length data as variable length
    • E.g. Integer – can use 1,2,3,4 bytes + bits instead of 4 bytes + bit
  • Available in Enterprise edition

CREATE TABLE RowCompressedTable

(…)

WITH (DATA_COMPRESSION = Row);

compressed row
Compressed row
  • CD Array: 0 = null, 1 – 9 number of bytes, 10 – long
  • Self contained

Column Count (1/2)

CD Array

(4b/col)

StatA

(1)

Short data

Null bitmap

Ceiling(ColCnt/8)

Var-Len Column Count

(2)

Var. Offsets

Var-Len Data

WITH (data_compression = row)

page compression
Page compression
  • Row compression
  • Prefix compression
  • Dictionary compression
  • When table created, there is no compression
  • Row compression kicks in when otherwise a page split would occur
  • When table with data converted it is rebuilt
  • sp_estimate_data_compression_savings
prefix compression
Prefix compression

Page header

Page header

aaabcc

aaaacc

abcd

aaabb

aaaab

abcd

4b

4b

[]

aaabcc

bbbb

abcd

[]

0bbbb

[]

aaaccc

aaaacc

bbbb

3ccc

[]

0bbbb

dictionary compression
Dictionary compression

Page header

Page header

aaabcc

aaaacc

abcd

aaabcc

aaaacc

abcd

4b

0bbbb

4b

4b

[]

0

0

[]

[]

0bbbb

[]

[]

1

[]

3ccc

[]

0bbbb

3ccc

[]

1

blob structure
BLOB Structure
  • B-tree structure
  • Many pages need to be looked up
  • Smaller BLOBs can be inlined

sp_tableoption <tablename>, ‘text in row’, <length>

Data row

Text Pointer

Root entry

Intermediate node

Intermediate node

Data fragment

Data fragment

Data fragment

Data fragment

filestreams
Filestreams
  • When BLOBs are not enough:
    • Large items (over 1Mb)
    • Very fast read is needed
    • 2GB++
  • Can use T-SQL to access
  • File stream access vie Win32 API
summary
Summary
  • Static data storage
    • Table and index rows
    • The way these are linked together
  • What happens during schema and data modifications
  • Lessons to take away
    • Minimize the number of pages you need to read or write
    • Rebuild your tables and use fill factor, and rebuild indexes durng off peak hours!
    • Use the specialized data types and storage options
questions
Questions
  • Thanks to SQL Bits & Sponsors
  • Blog:

http://www.simple-talk.com/community/blogs/andras/default.aspx

  • Email: Andras.Belokosztolszki (at) red-gate.com